Retrieval Augmented Generation

Agenda

Large Language Models
Limitations of LLMS
Retrieval Augmented Generation (RAG)
Technical Deep Dive
Live Demo
The potential of RAG

Large Language Models

Untitled

Generative AI

Understand and generate “human-like” text
Architecture

Transformer architectures with self-attention mechanisms
Models

GPT, Llama, BERT, Mistral, Phi
Applications

Translation, summariSation, question answering, document analysis, coding assistance, research, customer service, etc.

Limitations of Large Language Models

Untitled

Hallucinations: Generating incorrect or nonsensical information.
Knowledge Cutoff: Limited by the training data's cutoff date.
No Interpretability: Difficulty in understanding how conclusions are reached.

What is RAG?

Hybrid AI model combining capabilities of traditional LLMs with real-time information retrieval.
Enhances the accuracy and reliability of generative models.
Builds trust by providing dynamic information retrieval.

Untitled

RAG example

Untitled

An overview of RAG

Untitled

Why is RAG Important Today?

Untitled

Mitigates hallucinations and knowledge cutoff.
Enhanced transparency with access to the source or reference.
Builds trust.

Real-world Applications of RAG

Customer Support
Academic Research
Healthcare and Diagnostics
Documentation and Development
Legal Document Analysis
Financial Analysis

Types of RAG Systems

Untitled

Main RAG Components

Untitled

Query Processing and Information Retrieval
Processing Documents
Response generation

Processing Documents in a RAG System

Untitled

Query Processing, Retrieval, and Response Generation

Untitled

Types of RAG Systems

Untitled

Sparse Retrieval: Uses keywords and tf-idf scores. Efficient but not context-aware.
Dense Retrieval: Uses machine learning models to understand the semantic context of queries and documents. Context-aware.

Simple RAG Architecture

Untitled

Full RAG Process

#the implementation process

1. Load documents from folder
2. Split documents
3. Create vector embeddings
4. Store vector embeddings
5. Get user query
6. Create vector embeddings of question
7. Conduct dense retrieval using a semantic search of vector database]
8. Filter the top 3 chunks of data according to similarity score
9. Generate promt and combine user query, context and prompt config
10. Send prompt to LLM
11. LLM will generate an accurate and reliable answer

Implementation of RAG

What Do We Need?

Untitled

LLM
RAG Framework
Data
RAG Pipeline

Tools and Technologies

Language: Python
LLM: Ollama
Model: Llama 3 - 7b
RAG Framework: Langchain
Data: A biography about Paul Graham (text)

Untitled

Note: Another capable RAG library is llama-index.

Implementation

Imports

#imports
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_community.document_loaders import TextLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import DirectoryLoader

Document loader


# loader = PDFPlumberLoader("data/paul_graham_essay.pdf")
# loader=TextLoader("data/paul_graham_essay.txt")

loader = DirectoryLoader("data/", glob="**/*.txt", loader_cls=TextLoader,use_multithreading=True,show_progress=True)
docs = loader.load()

# Check the number of pages
print("Number of pages in ingested data:",len(docs))

Text splitting

#split the documents into chunks
embedder = HuggingFaceEmbeddings(model_kwargs={'device':'mps'})

text_splitter = SemanticChunker(embedder)
documents = text_splitter.split_documents(docs)

Embedding

#create vector embeddings

# Check number of chunks created
print("Number of chunks created: ", len(documents))

# Printing first few chunks
for i in range(len(documents)):
    print()
    print(f"CHUNK : {i+1}")
    print(documents[i].page_content)

# Create the vector store 
vector = FAISS.from_documents(documents, embedder)

Retriever

# Retrieval
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
retrieved_docs = retriever.invoke("did he like programming?")
retrieved_docs

LLM and prompt

from langchain.chains import RetrievalQA
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.prompts import PromptTemplate

from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

Prompt

prompt = """
1. Use the following parts of context to answer the question at the end.
2. If you don't know the answer, then just say that "I don't know" but don't make up an answer on your own.
3. Keep the answer clear and limited to 3 or 4 sentences. There is no need to say "According to the context"

Context: {context}

Question: {question}

Helpful Answer:"""

RAG pipeline

contextual_prompt = PromptTemplate.from_template(prompt) 

llm_pipeline = LLMChain(
                  llm=llm, 
                  prompt=contextual_prompt, 
                  callbacks=None, 
                  verbose=True)

rag_prompt = PromptTemplate(
    input_variables=["page_content", "source"],
    template="Context:\ncontent:{page_content}\nsource:{source}",
)

retrieval = StuffDocumentsChain(
                  llm_chain=llm_pipeline,
                  document_variable_name="context",
                  document_prompt=rag_prompt,
                  callbacks=None,
              )

chat = RetrievalQA(
                  combine_documents_chain=retrieval,
                  verbose=False,
                  retriever=retriever,
                  return_source_documents=False,
              )

Testing

print(chat("did paul graham like animals?")['result'])

Data directory

Untitled

Results

Untitled

Conclusion

Untitled

Github: https://github.com/nirmal-k-r/devcon2024---RAG

Latest articles

Mastering LLM Fine-Tuning: Adapt and optimise with Python

Mastering LLM Fine-Tuning: Adapt and optimise with Python

RAG: Retrieval Augmented Generation with Python

NLP: Text analysis using Spacy with Python

Data Exploration: Python's hidden libraries

Why Virtual Environments Matter in Python

Building a chatbot with ChatGPT

Web Development

An introduction to APIs

Upcoming Trainings

Full Stack Developer Bootcamp

On campus & Online

Frontend Web Development Course

On campus & Online

React Developer Bootcamp

On campus & Online

Python Programming Course

On campus & Online

Data Science Bootcamp

On campus & Online

Web Development Course - with NodeJS

On campus & Online

Github for Developers - Workshop

On campus, Mauritius

Graphic Design Bootcamp

On campus, Mauritius

Learn the coding skills to
advance your career !

Phone: 59810950

Whatsapp Us

Call Us