RAG Search

Retrieval-augmented generation powering all agent responses. Every query is grounded in 500+ indexed documents — code, docs, conversations, and institutional knowledge — retrieved by semantic similarity and injected as live context.

How It Works

Ingest

Documents are chunked, embedded, and indexed into the vector store. Metadata tagging and overlap windows preserve context across chunk boundaries.

Search

Semantic similarity search finds the most relevant chunks for any query. Cosine distance over embeddings — not keyword matching — so meaning is preserved.

Augment

Retrieved context is injected into agent prompts before each execution round. Agents arrive grounded, not blank — answers are traceable to source documents.

Capabilities

Semantic Search

Cosine similarity across dense embeddings. Finds conceptually related content even when exact keywords are absent.

Chunk Management

Automatic chunking with configurable overlap. Each chunk carries source metadata for attribution and re-ranking.

Multi-Index

Separate indexes for docs, code, conversations, and entities. Queries can target a single index or fan out across all four.

Freshness Tracking

Automatic staleness detection flags outdated chunks. Re-indexing triggers run on document updates to keep the knowledge layer current.

API Endpoints

`rag/search`

Full-text semantic search across all indexed content. Returns ranked chunks with source references and similarity scores.

`rag/ask`

RAG-powered question answering. Retrieves the most relevant context and synthesizes a grounded, cited response.

`rag/ingest`

Feed new documents into the index. Accepts plain text, markdown, and structured content. Chunking and embedding run automatically.

`rag/stats`

Index statistics and health. Reports document count, chunk count, index freshness, and per-index breakdown.

Stats

500+ documents | 12,000+ chunks | 4 indexes | <100ms P95 latency

Try RAG Search

RAG search is available to all agents and paid API users.