Retrieval Augmented Generation

Agenda

  • Large Language Models

  • Limitations of LLMS

  • Retrieval Augmented Generation (RAG)

  • Technical Deep Dive

  • Live Demo

  • The potential of RAG

Large Language Models

Untitled

  • Generative AI

    Understand and generate “human-like” text

  • Architecture

    Transformer architectures with self-attention mechanisms

  • Models

    GPT, Llama, BERT, Mistral, Phi

  • Applications

    Translation, summariSation, question answering, document analysis, coding assistance, research, customer service, etc.

Limitations of Large Language Models

Untitled

  1. Hallucinations: Generating incorrect or nonsensical information.
  2. Knowledge Cutoff: Limited by the training data's cutoff date.
  3. No Interpretability: Difficulty in understanding how conclusions are reached.

What is RAG?

  • Hybrid AI model combining capabilities of traditional LLMs with real-time information retrieval.
  • Enhances the accuracy and reliability of generative models.
  • Builds trust by providing dynamic information retrieval.

Untitled

RAG example

Untitled

An overview of RAG

Untitled

Why is RAG Important Today?

Untitled


  • Mitigates hallucinations and knowledge cutoff.
  • Enhanced transparency with access to the source or reference.
  • Builds trust.

Real-world Applications of RAG

  • Customer Support
  • Academic Research
  • Healthcare and Diagnostics
  • Documentation and Development
  • Legal Document Analysis
  • Financial Analysis

Types of RAG Systems

Untitled

Main RAG Components

Untitled

  1. Query Processing and Information Retrieval
  2. Processing Documents
  3. Response generation

Processing Documents in a RAG System

Untitled

Query Processing, Retrieval, and Response Generation

Untitled

Types of RAG Systems

Untitled

  • Sparse Retrieval: Uses keywords and tf-idf scores. Efficient but not context-aware.
  • Dense Retrieval: Uses machine learning models to understand the semantic context of queries and documents. Context-aware.

Simple RAG Architecture

Untitled

Full RAG Process

#the implementation process

1. Load documents from folder
2. Split documents
3. Create vector embeddings
4. Store vector embeddings
5. Get user query
6. Create vector embeddings of question
7. Conduct dense retrieval using a semantic search of vector database]
8. Filter the top 3 chunks of data according to similarity score
9. Generate promt and combine user query, context and prompt config
10. Send prompt to LLM
11. LLM will generate an accurate and reliable answer

Implementation of RAG

What Do We Need?

Untitled

  1. LLM
  2. RAG Framework
  3. Data
  4. RAG Pipeline

Tools and Technologies

  • Language: Python
  • LLM: Ollama
  • Model: Llama 3 - 7b
  • RAG Framework: Langchain
  • Data: A biography about Paul Graham (text)

Untitled

Note: Another capable RAG library is llama-index.

Implementation

  1. Imports

    #imports
    from langchain_community.document_loaders import PDFPlumberLoader
    from langchain_community.document_loaders import TextLoader
    from langchain_experimental.text_splitter import SemanticChunker
    from langchain_huggingface import HuggingFaceEmbeddings
    from langchain_community.vectorstores import FAISS
    from langchain_community.document_loaders import DirectoryLoader
  2. Document loader

    
    # loader = PDFPlumberLoader("data/paul_graham_essay.pdf")
    # loader=TextLoader("data/paul_graham_essay.txt")
    
    loader = DirectoryLoader("data/", glob="**/*.txt", loader_cls=TextLoader,use_multithreading=True,show_progress=True)
    docs = loader.load()
    
    # Check the number of pages
    print("Number of pages in ingested data:",len(docs))
  3. Text splitting

    #split the documents into chunks
    embedder = HuggingFaceEmbeddings(model_kwargs={'device':'mps'})
    
    text_splitter = SemanticChunker(embedder)
    documents = text_splitter.split_documents(docs)
  4. Embedding

    #create vector embeddings
    
    # Check number of chunks created
    print("Number of chunks created: ", len(documents))
    
    # Printing first few chunks
    for i in range(len(documents)):
        print()
        print(f"CHUNK : {i+1}")
        print(documents[i].page_content)
    
    # Create the vector store 
    vector = FAISS.from_documents(documents, embedder)
  5. Retriever

    # Retrieval
    retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
    retrieved_docs = retriever.invoke("did he like programming?")
    retrieved_docs
  6. LLM and prompt

    from langchain.chains import RetrievalQA
    from langchain.chains.llm import LLMChain
    from langchain.chains.combine_documents.stuff import StuffDocumentsChain
    from langchain.prompts import PromptTemplate
    
    from langchain_community.llms import Ollama
    
    llm = Ollama(model="llama3")
  7. Prompt

    prompt = """
    1. Use the following parts of context to answer the question at the end.
    2. If you don't know the answer, then just say that "I don't know" but don't make up an answer on your own.
    3. Keep the answer clear and limited to 3 or 4 sentences. There is no need to say "According to the context"
    
    Context: {context}
    
    Question: {question}
    
    Helpful Answer:"""
  8. RAG pipeline

    contextual_prompt = PromptTemplate.from_template(prompt) 
    
    llm_pipeline = LLMChain(
                      llm=llm, 
                      prompt=contextual_prompt, 
                      callbacks=None, 
                      verbose=True)
    
    rag_prompt = PromptTemplate(
        input_variables=["page_content", "source"],
        template="Context:\ncontent:{page_content}\nsource:{source}",
    )
    
    retrieval = StuffDocumentsChain(
                      llm_chain=llm_pipeline,
                      document_variable_name="context",
                      document_prompt=rag_prompt,
                      callbacks=None,
                  )
    
    chat = RetrievalQA(
                      combine_documents_chain=retrieval,
                      verbose=False,
                      retriever=retriever,
                      return_source_documents=False,
                  )
  9. Testing

    print(chat("did paul graham like animals?")['result'])

Data directory

Untitled

Results

Untitled

Conclusion

Untitled

Github: https://github.com/nirmal-k-r/devcon2024---RAG

Learn the coding skills to
advance your career !