Implementing RAG: Practical Steps, Code Snippets, and Tools
Implementing a RAG system involves setting up the retrieval and generation components, then integrating them into a pipeline.
Prerequisites:
- Python 3.8+
- Transformers library
- FAISS (for vector search)
- A document corpus (can be custom or dataset)
Step 1: Prepare the Document Corpus
# Example: embedding documents
from transformers import AutoTokenizer, AutoModel
import torch
from sklearn.preprocessing import normalize
documents = ["Document 1 text", "Document 2 text", "Document 3 text"]
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
def embed(texts):
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
embeddings = model(**inputs).last_hidden_state[:,0,:]
return normalize(embeddings.numpy())
embeddings = embed(documents)
Step 2: Create a FAISS index:
import faiss
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
Step 3: Retrieval Function:
def retrieve(query, top_k=3):
query_embedding = embed([query])
distances, indices = index.search(query_embedding, top_k)
return [documents[i] for i in indices[0]]
Step 4: Generate Response using a Pre-trained Generator (e.g., T5):
from transformers import T5ForConditionalGeneration, T5Tokenizer
generator_tokenizer = T5Tokenizer.from_pretrained('t5-small')
generator_model = T5ForConditionalGeneration.from_pretrained('t5-small')
def generate_answer(query, retrieved_docs):
input_text = f"question: {query} context: { ' '.join(retrieved_docs) }"
inputs = generator_tokenizer.encode(input_text, return_tensors='pt', max_length=512, truncation=True)
output_ids = generator_model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)
return generator_tokenizer.decode(output_ids[0], skip_special_tokens=True)
Step 5: Full Pipeline Example:
query = "What are the latest advances in renewable energy?"
retrieved_docs = retrieve(query)
answer = generate_answer(query, retrieved_docs)
print(answer)
This pipeline demonstrates how to set up a RAG system end-to-end using open-source tools. For production, consider optimizing indexing, retrieval speed, and response coherence.