GitHubBlog

Search Documentation

Search for a page in the docs

News & RSS

OpenAlice runs a background RSS collector that fetches news from configurable feeds and stores them in a persistent archive. Three search tools let Alice scan and read articles following the Unix philosophy.

How It Works

  1. The NewsCollector fetches RSS/Atom feeds at a configurable interval (default: every 10 minutes)
  2. New articles are deduplicated and stored in JSONL files organized by date (data/news-collector/{date}.jsonl)
  3. An in-memory buffer holds recent articles (default: 2000) for fast queries
  4. Three search tools let Alice find and read articles

Search Tools

globNews

Search by title pattern — like ls or glob. Fast way to scan what's been happening.

You: What Bitcoin news has there been in the last 24 hours?

Alice: [calls globNews(pattern="BTC|Bitcoin", lookback="1d")]

      Found 8 articles:
      [0] "Bitcoin Surges Past $70K as ETF Inflows Accelerate" (2.3k chars)
      [1] "BTC Mining Difficulty Hits All-Time High" (1.8k chars)
      ...

Parameters:

  • pattern — Regex matched against titles
  • lookback — Time range: "1h", "12h", "1d", "7d"
  • metadataFilter — Filter by metadata key-value pairs (e.g. { "source": "coindesk" })
  • limit — Max results

grepNews

Search article content by pattern — like grep. Returns matched text with surrounding context.

You: Find any news mentioning interest rate decisions.

Alice: [calls grepNews(pattern="interest rate", lookback="2d")]

      [3] "Fed Minutes Signal..." — "...the committee discussed interest rate trajectory amid..."
      [7] "ECB Holds Steady..." — "...unchanged interest rate decision was widely expected..."

Parameters:

  • pattern — Regex to search in title and content
  • lookback — Time range
  • contextChars — Characters of context around each match (default: 50)

readNews

Read the full content of an article by index — like cat.

You: Read article #0 from the Bitcoin search.

Alice: [calls readNews(index=0, lookback="1d")]

      "Bitcoin Surges Past $70K as ETF Inflows Accelerate"
      Source: coindesk | Published: 2025-03-15T10:30:00Z

      Bitcoin crossed the $70,000 mark for the first time since...

Use the same lookback as your previous glob/grep query to get consistent indices.

Configuration

Configure in data/config/news.json:

{
  "enabled": true,
  "intervalMinutes": 10,
  "maxInMemory": 2000,
  "retentionDays": 7,
  "feeds": [
    { "name": "CoinDesk", "url": "https://www.coindesk.com/arc/outboundfeeds/rss/", "source": "coindesk" },
    { "name": "CoinTelegraph", "url": "https://cointelegraph.com/rss", "source": "cointelegraph" },
    { "name": "The Block", "url": "https://www.theblock.co/rss.xml", "source": "theblock" },
    { "name": "CNBC Finance", "url": "https://search.cnbc.com/rs/search/combinedcms/view.xml?partnerId=wrss01&id=10000664", "source": "cnbc" }
  ]
}
FieldDescription
enabledMaster switch for the news collector
intervalMinutesHow often to fetch feeds (default: 10)
maxInMemoryMax articles in the in-memory buffer (default: 2000)
retentionDaysArticles older than this aren't loaded on startup (default: 7)
feedsArray of RSS/Atom feed definitions

Adding Custom Feeds

Add any RSS or Atom feed to the feeds array:

{
  "name": "Reuters Markets",
  "url": "https://www.reutersagency.com/feed/?taxonomy=best-sectors&post_type=best",
  "source": "reuters",
  "categories": ["markets"]
}

Each feed needs:

  • name — Display name
  • url — RSS/Atom feed URL
  • source — Short identifier (used in metadata filtering)
  • categories — Optional tags for categorization

Storage

Articles are stored as JSONL files in data/news-collector/, organized by date:

data/news-collector/
├── 2025-03-14.jsonl
├── 2025-03-15.jsonl
└── 2025-03-16.jsonl

Each line contains: title, content, URL, published date, source, and metadata. Files older than retentionDays are not loaded into memory but remain on disk.