News & RSS
OpenAlice runs a background RSS collector that fetches news from configurable feeds and stores them in a persistent archive. Three search tools let Alice scan and read articles following the Unix philosophy.
How It Works
- The NewsCollector fetches RSS/Atom feeds at a configurable interval (default: every 10 minutes)
- New articles are deduplicated and stored in JSONL files organized by date (
data/news-collector/{date}.jsonl) - An in-memory buffer holds recent articles (default: 2000) for fast queries
- Three search tools let Alice find and read articles
Search Tools
globNews
Search by title pattern — like ls or glob. Fast way to scan what's been happening.
You: What Bitcoin news has there been in the last 24 hours?
Alice: [calls globNews(pattern="BTC|Bitcoin", lookback="1d")]
Found 8 articles:
[0] "Bitcoin Surges Past $70K as ETF Inflows Accelerate" (2.3k chars)
[1] "BTC Mining Difficulty Hits All-Time High" (1.8k chars)
...
Parameters:
pattern— Regex matched against titleslookback— Time range:"1h","12h","1d","7d"metadataFilter— Filter by metadata key-value pairs (e.g.{ "source": "coindesk" })limit— Max results
grepNews
Search article content by pattern — like grep. Returns matched text with surrounding context.
You: Find any news mentioning interest rate decisions.
Alice: [calls grepNews(pattern="interest rate", lookback="2d")]
[3] "Fed Minutes Signal..." — "...the committee discussed interest rate trajectory amid..."
[7] "ECB Holds Steady..." — "...unchanged interest rate decision was widely expected..."
Parameters:
pattern— Regex to search in title and contentlookback— Time rangecontextChars— Characters of context around each match (default: 50)
readNews
Read the full content of an article by index — like cat.
You: Read article #0 from the Bitcoin search.
Alice: [calls readNews(index=0, lookback="1d")]
"Bitcoin Surges Past $70K as ETF Inflows Accelerate"
Source: coindesk | Published: 2025-03-15T10:30:00Z
Bitcoin crossed the $70,000 mark for the first time since...
Use the same
lookbackas your previous glob/grep query to get consistent indices.
Configuration
Configure in data/config/news.json:
{
"enabled": true,
"intervalMinutes": 10,
"maxInMemory": 2000,
"retentionDays": 7,
"feeds": [
{ "name": "CoinDesk", "url": "https://www.coindesk.com/arc/outboundfeeds/rss/", "source": "coindesk" },
{ "name": "CoinTelegraph", "url": "https://cointelegraph.com/rss", "source": "cointelegraph" },
{ "name": "The Block", "url": "https://www.theblock.co/rss.xml", "source": "theblock" },
{ "name": "CNBC Finance", "url": "https://search.cnbc.com/rs/search/combinedcms/view.xml?partnerId=wrss01&id=10000664", "source": "cnbc" }
]
}
| Field | Description |
|---|---|
enabled | Master switch for the news collector |
intervalMinutes | How often to fetch feeds (default: 10) |
maxInMemory | Max articles in the in-memory buffer (default: 2000) |
retentionDays | Articles older than this aren't loaded on startup (default: 7) |
feeds | Array of RSS/Atom feed definitions |
Adding Custom Feeds
Add any RSS or Atom feed to the feeds array:
{
"name": "Reuters Markets",
"url": "https://www.reutersagency.com/feed/?taxonomy=best-sectors&post_type=best",
"source": "reuters",
"categories": ["markets"]
}
Each feed needs:
name— Display nameurl— RSS/Atom feed URLsource— Short identifier (used in metadata filtering)categories— Optional tags for categorization
Storage
Articles are stored as JSONL files in data/news-collector/, organized by date:
data/news-collector/
├── 2025-03-14.jsonl
├── 2025-03-15.jsonl
└── 2025-03-16.jsonl
Each line contains: title, content, URL, published date, source, and metadata. Files older than retentionDays are not loaded into memory but remain on disk.