Retrieval-Augmented Generation (RAG)

The RAG framework hinges on two crucial modules: the retriever and the generator.

Retriever: This component searches an external document corpus (such as a vector database or search index) to fetch relevant snippets based on the input query. Common retrieval methods include dense retrieval using embeddings (e.g., FAISS with transformers), sparse retrieval with BM25, or hybrid approaches. The retriever converts the query into a vector space, then finds documents with vectors closest to it.

Generator: The generator is typically a sequence-to-sequence transformer model (like BART, T5, or GPT) that produces responses conditioned on both the original query and the retrieved documents. It integrates the external data to generate accurate, context-aware answers.

Workflow:

- The query is embedded and fed into the retrieval system.
- Top-k relevant documents are retrieved.
- These documents, along with the query, are passed to the generator.
- The generator synthesizes a response that combines the query context with the retrieved knowledge.

This modular approach allows for flexible updates to the knowledge base without retraining the entire language model, promoting scalability and adaptability.

Table of Contents

The RAG framework hinges on two crucial modules: the retriever and the generator.