Skip to content

Configuration

ketch reads defaults from ~/.config/ketch/config.json. Flags always override config values.

Setup

sh
# Create a default config file
ketch config init

# View effective config + available backends
ketch config

The discovery payload:

json
{
  "config_path": "/home/user/.config/ketch/config.json",
  "backend": "brave",
  "searxng_url": "http://localhost:8081",
  "limit": 5,
  "cache_ttl": "72h",
  "browser": "chrome",
  "available_backends": ["brave", "ddg", "searxng"]
}

Setting Values

sh
ketch config set backend searxng
ketch config set brave_api_key BSA...
ketch config set searxng_url http://my-searxng:8080
ketch config set limit 10
ketch config set cache_ttl 4h
ketch config set browser chrome

Config Keys

KeyDefaultDescription
backendbraveDefault search backend
brave_api_keyBrave Search API key (get one free)
searxng_urlhttp://localhost:8081SearXNG instance URL
limit5Default max search results
cache_ttl72hHow long scraped pages stay cached
browserBrowser for JS-rendered pages: chrome, chromium, or absolute path

Browser Rendering

JS-rendered pages are automatically detected and re-fetched via headless Chrome. Configure once, then scrape and crawl commands use it transparently.

sh
# Use Chrome from PATH
ketch config set browser chrome

# Or use an absolute path
ketch config set browser /usr/bin/google-chrome-stable

# Download Chromium to ketch's cache dir
ketch browser install

# Check browser config and availability
ketch browser status

When a browser is configured, ketch automatically detects JS-rendered pages (React SPAs, Angular apps, Salesforce Lightning, etc.) and falls back to headless rendering. Static pages are always fetched via plain HTTP for speed.

Page Cache

Scraped and crawled pages are cached in a single bbolt database at the platform cache directory:

OSPath
Linux~/.cache/ketch/cache.db
macOS~/Library/Caches/ketch/cache.db
Windows%LocalAppData%/ketch/cache.db
sh
# View cache stats
ketch cache

# Clear all cached pages
ketch cache clear

# Bypass cache for a single scrape
ketch scrape https://example.com --no-cache

# Bypass cache for a crawl (force re-fetch everything)
ketch crawl https://example.com --no-cache