Changelog
v0.2.0
Browser rendering, crawl, and cache overhaul.
- Browser rendering: JS-rendered pages (React, Angular, Salesforce Lightning) automatically detected and re-fetched via headless Chrome using Rod
ketch config set browser chrome— configure browserketch browser install— download Chromiumketch browser status— check browser availability- Detection heuristic: visible text, noscript tags, SPA markers, script-to-text ratio, loading page detection
- Transparent fallback — agents see the same output format
- Crawl command: BFS and sitemap-based site crawling
ketch crawl <url>— BFS crawl with configurable depth and concurrencyketch crawl <url> --sitemap— sitemap-based crawlketch crawl <url> --background— detached process with status trackingketch crawl status/ketch crawl stop— monitor and control background crawls- Per-host JS shell tracking: auto-switches to browser mode when >80% of pages are JS-rendered
- Cached re-crawls complete in under a second
- Cache backend: migrated from filesystem (one file per page) to embedded bbolt database
- Single
cache.dbfile instead of thousands of hashed files Storeinterface for future backend support (redis, etc.)- Default TTL changed from 1h to 72h
- Shared cache between scrape and crawl commands
- Single
- Config: added
browserkey for browser configuration
v0.1.0
Initial release.
- Search via Brave, DuckDuckGo, or SearXNG
- Scrape URLs to clean markdown (readability + html-to-markdown)
- Concurrent batch scraping
- YAML frontmatter + markdown output format
- JSON config at
~/.config/ketch/config.json - TTL-based page cache with platform-correct paths
ketch configdiscovery payload for agent introspection--jsonflag on all commands- GoReleaser + Homebrew tap publishing