Flo-AI RAGs

A composable RAG

FloAI has introduced a composable Retrieval-Augmented Generation (RAG) system, designed to easily integrate with agentic workflows or be used independently. The RAG system allows agents to intelligently retrieve relevant information from a vector store and use that information to generate responses or perform tasks.

What is a RAG ?

RAG stands for Retrieval-Augmented Generation. It's a technique that combines the capabilities of a retrieval system (like a search engine or database query) with a generative model (like a language model) to produce more accurate, context-aware responses. This approach helps address the limitations of purely generative models, which can sometimes produce incorrect or hallucinated information.

How RAG Works

RAG typically follows a two-step process:

Retrieval:
- In this step, the system first searches a large collection of documents (e.g., a database, knowledge base, or text corpus) to find relevant information related to a given query or prompt.
- The retrieval component uses methods like traditional search engines, vector embeddings (e.g., using FAISS, ElasticSearch, Pinecone), or dense passage retrieval (DPR) to fetch the most relevant pieces of text (passages or chunks).
Augmented Generation:
- The retrieved documents or text chunks are then passed as context to a generative language model (like GPT, BERT, or any transformer-based model).
- The generative model uses this additional context to craft a more accurate and informed response.

Benefits of RAG

Improved Accuracy:
- By grounding the generative model's response in real, retrieved data, RAG reduces the risk of generating incorrect or made-up information (hallucination).
Scalability with Large Corpora:
- RAG can work with very large, dynamic data sources without needing to fine-tune the generative model on all this data. Instead, the model is augmented with relevant information at inference time.
Enhanced Contextual Understanding:
- Since RAG incorporates real-world information as context, it can handle complex or niche queries better than a standalone generative model.

Example of RAG in Action

Suppose you ask a RAG-powered chatbot: "What are the symptoms of dengue fever?"

Retrieval Step: The system first searches a medical database or collection of articles for passages related to "dengue fever symptoms."
Generation Step: The generative model then takes these retrieved passages and crafts a response like: "The common symptoms of dengue fever include high fever, severe headache, pain behind the eyes, joint and muscle pain, rash, and mild bleeding."

In this way, the response is both informed and based on actual, retrieved knowledge, making it more accurate.

RAG in FloAI

In the context of FloAI, RAG is designed to be composable, allowing it to be seamlessly integrated into agentic workflows. This means:

You can use RAG as a standalone component in a larger AI pipeline or embed it into an agent to provide context-aware responses based on external knowledge bases.
It can be combined with other agents, teams, or workflows, enhancing the flexibility and utility of the system in various use cases like Q&A, document summarization, and chatbots with domain-specific knowledge.

Plugging RAG in Agentic Flows

FloAI focussed on retrieval part of the RAG, assuming the insertion is a more usecase specific task

RAG in FloAI can be part of larger agentic workflows, enabling seamless retrieval and generation capabilities within complex systems. By embedding RAG into agent-driven architectures, you can create Agentic RAGs, allowing for more dynamic and knowledge-driven workflows.

from flo import FloSession, FloRagBuilder, FloCompressionPipeline
from langchain.chat_models import ChatOpenAI

# Initialize the session with an LLM
llm = ChatOpenAI(temperature=0, model_name='gpt-4')

session = FloSession(llm)

# Set up the RAG Builder and store in the vector store (can be any like Chroma, Pinecone, etc.)
rag_builder = FloRagBuilder(session, store.as_retriever())

# Create a compression pipeline for better results (reduces redundancy, enhances relevance)
compression_pipeline = FloCompressionPipeline(OpenAIEmbeddings(model="<embeddings model>"))
compression_pipeline.add_embedding_redundant_filter()
compression_pipeline.add_embedding_relevant_filter()

# Build the RAG
rag = rag_builder \
    .with_prompt(custom_prompt) \
    .with_multi_query() \
    .with_compression(compression_pipeline) \
    .build_rag()

# Invoke the RAG
response = rag.invoke({ "question": "What are the documents applying for housing loan" })
print(response)

# You can also pass chat history
response_with_history = rag.invoke({ "question": "What are the documents applying for housing loan", "chat_history": [] })
print(response_with_history)

In this example:

FloRagBuilder helps build the RAG system.
FloCompressionPipeline allows the integration of components like re-rankers and duplicate removers for higher-quality results.
The RAG is built with a custom prompt and multi-query to retrieve semantically similar documents.

The compression pipelines are implementation of compressions, you can refer to this to know more: https://python.langchain.com/docs/how_to/contextual_compression

Agentic RAG Tools

FloAI allows you to convert the RAG system into a reusable tool. You can easily create a RAG Tool, which can then be used in agentic flows.

rag_tool = rag_builder \
    .with_multi_query() \
    .build_rag_tool(name="RAGTool", description="RAG to answer questions by looking at the database")

Once the RAG tool is built, it can be registered in a session and used within any agent workflow.

Using RAG in Agentic Flow

After creating and registering the RAG tool, you can use it in a YAML configuration for an agentic workflow. Here’s an example where a team of agents handles various tasks like sending emails, fetching transactions, and answering questions about housing loans using a RAG Tool.

apiVersion: flo/alpha-v1
kind: FloRoutedTeam
name: support-email-handler
team:
    name: SupportTicketHandler
    router:
        name: SupportSupervisor
        kind: supervisor
    agents:
      - name: EmailSender
        role: Email Sender
        job: You are capable of sending the reply email but constructing an apt response
        tools:
          - name: SendEmailTool
      - name: TransactionFetcher
        role: Transaction Fetcher
        job: You are capable of fetching any kind of transactions from the database given transaction reference id
        tools:
          - name: FetchTransactionTool
      - name: HousingLoanTeamLead
        role: Housing Loan Specialist
        job: Fetch the housing loan information from the db and answer the question
        tools:
          - name: HousingLoanTool

In this example:

Agents are responsible for different tasks, such as sending emails, fetching transactions, or answering housing loan-related queries.
The HousingLoanTool is registered in the session as an RAG-based tool, fetching relevant documents for answering housing loan questions.

By building a composable RAG system and making it easy to plug into agentic flows, FloAI enables the creation of powerful, knowledge-driven workflows. Whether used independently or as part of a broader agentic system, RAG helps retrieve, filter, and generate responses from relevant data sources, enhancing both accuracy and efficiency.

PreviousTeams NextComposability

Last updated 9 months ago

Was this helpful?