Build a RAG Application with LangFlow

Build a RAG Application with LangFlow
Learn to combine document retrieval with language generation in a visual, no-code/low-code environment.
from langflow import load_flow

# Load your RAG flow configuration
flow = load_flow("rag_flow.json")

# Example document to process
docs = [
    "LangFlow is a GUI for LangChain.",
    "It enables rapid prototyping of LLM apps."
]

# Initialize the RAG pipeline
rag_chain = flow.get_chain()

# Query your documents
response = rag_chain.run(
    query="What is LangFlow?",
    documents=docs
)

print(response)
This example demonstrates loading a LangFlow RAG pipeline, preparing documents, and running queries against your knowledge base.
What is Retrieval-Augmented Generation (RAG)?
Retrieval-Augmented Generation (RAG) enhances language models by integrating external knowledge. Instead of relying solely on a model's internal parameters, a RAG system retrieves relevant documents from a database and feeds that into the generation process.
Retrieval
Search and obtain relevant documents.
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Load and index documents
loader = TextLoader("data.txt")
docs = loader.load()
db = Chroma.from_documents(
    docs, 
    OpenAIEmbeddings()
)

# Retrieve relevant docs
query = "How does RAG work?"
docs = db.similarity_search(query)
Augmentation
Merge retrieved context with the generation prompt.
from langchain.prompts import PromptTemplate

# Create prompt with context
template = """Use the following context to answer:
Context: {context}
Question: {question}
Answer:"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

# Combine context with query
context = "\n".join([d.page_content for d in docs])
final_prompt = prompt.format(
    context=context,
    question=query
)
Generation
Use the combined input to produce accurate, context-rich responses.
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage

# Generate response
llm = ChatOpenAI()
response = llm([
    HumanMessage(content=final_prompt)
])

print(response.content)
This concept helps to overcome limitations such as outdated or incomplete model knowledge.
Introduction to LangFlow
LangFlow is a graphical interface built on top of LangChain that allows you to visually design AI pipelines without extensive coding. With drag-and-drop components, LangFlow makes it simple to prototype RAG systems quickly.
1
Visual Workflow Builder
Arrange components like file loaders, embedders, and chat interfaces.
2
Component Integration
Easily connect various AI modules.
3
Rapid Iteration
Quickly test and refine your AI pipelines.
LangFlow's intuitive design helps lower the barrier for building sophisticated AI systems. Here's how these concepts translate to code:
# Example 1: Creating a basic RAG pipeline
from langflow import load_flow_from_json

# Load a predefined workflow
flow = load_flow_from_json("my_rag_flow.json")

# Components are automatically connected based on the visual design
loader = flow.get_component("PDFLoader")
embedder = flow.get_component("OpenAIEmbeddings")
vectorstore = flow.get_component("Chroma")
llm = flow.get_component("ChatOpenAI")

# Execute the flow
response = flow.execute(
    input_data={"query": "What is RAG?"}
)
    
The above code demonstrates how LangFlow's visual designs can be exported and run programmatically, combining the benefits of visual development with code-based execution.
Prerequisites and System Setup
Before you begin, ensure you have the following system requirements. LangFlow requires a suitable operating system and an updated version of Python.
Operating System
Windows, macOS, or Linux
Python
Version 3.10 or above
Package Manager
pip installed
Verify Python Installation
python --version
# Expected output: Python 3.10.0 or higher
Check pip Installation
pip --version
# Expected output: pip 21.0.0 or higher
If you need to install or upgrade Python, visit python.org. For pip installation, you can run:
python -m ensurepip --upgrade
Step-by-Step: Installing LangFlow
Install LangFlow using pip. Open your terminal and run the following command to get the latest version:
$ pip install langflow --pre --force-reinstall
Collecting langflow
  Downloading langflow-0.5.3-py3-none-any.whl (5.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.8/5.8 MB 4.2 MB/s eta 0:00:00
Installing collected packages: langflow
Successfully installed langflow-0.5.3
This command ensures a clean installation by using --pre to allow pre-release versions and --force-reinstall to reinstall dependencies. Verify the installation by running:
$ langflow --version
LangFlow Version: 0.5.3
Python Version: 3.10.12
Platform: Linux-5.15.0-1041-azure-x86_64-with-glibc2.31
This will display the installed LangFlow version, confirming the installation was successful.
Launching the LangFlow Interface
Once installed, launch LangFlow by executing:
langflow run
You can also specify custom host and port settings:
# Launch on a specific port
langflow run --port 7861

# Launch on a specific host
langflow run --host 0.0.0.0

# Launch with both custom host and port
langflow run --host 0.0.0.0 --port 7861

# Launch in debug mode for troubleshooting
langflow run --debug
Then, open your browser and navigate to:
http://localhost:7860
You should see LangFlow's interface—a blank canvas ready for your components. If the browser doesn't launch automatically, simply copy-paste the URL into your address bar.
If you encounter any issues, you can check the server status with:
langflow status
Navigating the LangFlow Interface
The LangFlow interface is designed for simplicity. Familiarize yourself with the key areas to start building your AI pipelines efficiently.
Canvas
Your workspace for dragging and dropping components.
from langflow import Canvas

# Create a new canvas
canvas = Canvas()

# Add components to specific positions
canvas.add_component("ChatOpenAI", position=(100, 100))
canvas.add_component("TextLoader", position=(300, 100))
Sidebar
Contains pre-built components like File Loader and Embedding modules.
from langflow.components import load_component

# Access sidebar components
file_loader = load_component("FileLoader")
embeddings = load_component("OpenAIEmbeddings")
chat_model = load_component("ChatOpenAI")
Properties Panel
Edit settings for each component.
# Example component configuration
{
    "model": "gpt-3.5-turbo",
    "temperature": 0.7,
    "max_tokens": 500,
    "api_key": "your-api-key"
}
Creating Your First LangFlow Project
Let's start a new project by following these steps to set up your workspace and lay the groundwork for your RAG chatbot.
1
New Project
Click "New Project" from the top menu. Select "Blank Flow" to start fresh.
from langflow import LangFlowProject

project = LangFlowProject(
    name="My RAG Chatbot",
    description="A chatbot using RAG architecture"
)
2
Save Early
Name your project descriptively (e.g., "My RAG Chatbot").
project.save()

# Auto-save configuration
project.configure(
    auto_save=True,
    save_interval=300  # Save every 5 minutes
)
3
Arrange Workspace
Plan out where your retrieval, embedding, and chat components will go for optimal organization.
workspace = project.get_workspace()
workspace.configure_layout(
    components=[
        "FileLoader",
        "TextSplitter",
        "Embeddings",
        "VectorStore",
        "ChatModel"
    ],
    auto_arrange=True
)
Data Ingestion with File Loader and Text Splitter
Data ingestion is the initial step in creating your RAG pipeline. This involves loading your data and splitting it into manageable chunks.
1
2
1
File Loader
Drag the "File Loader" onto the canvas and configure it to load your document (PDF, CSV, or plain text).
from langchain.document_loaders import TextLoader

# Load a text file
loader = TextLoader("data.txt")
documents = loader.load()

# For PDFs
from langchain.document_loaders import PyPDFLoader
pdf_loader = PyPDFLoader("document.pdf")
2
Text Splitter
Drag "Text Splitter" and connect it to the File Loader. Configure parameters like Chunk Size and Overlap.
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)
splits = text_splitter.split_documents(documents)
Building the Embedding Component
With your text split, the next step is converting it into vector representations using OpenAI Embeddings. Vector embeddings allow for quick similarity searches.
1
OpenAI Embeddings
Drag the "OpenAI Embeddings" component onto the canvas.
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    openai_api_key="your-api-key"
)
2
Connect
Connect it to the output of your Text Splitter.
documents = text_splitter.split_documents(raw_documents)
doc_embeddings = embeddings.embed_documents(
    [doc.page_content for doc in documents]
)
3
Configure
Use a model like text-embedding-ada-002. Adjust parameters based on document complexity.
# Advanced configuration
embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    chunk_size=1000,  # Process in batches
    max_retries=3,    # Retry on API errors
    timeout=30        # Seconds to wait for response
)
Setting Up the Vector Database
A vector database stores your embeddings and enables quick similarity lookups. FAISS is ideal for prototyping, while Astra DB provides a scalable cloud-based solution.
FAISS
In-Memory
Ideal for prototyping.
Astra DB
Cloud-Based
Scalable solutions.
FAISS Setup Example:
from langchain.vectorstores import FAISS

# Create vector store from documents
vectorstore = FAISS.from_documents(
    documents=text_chunks,
    embedding=embeddings
)

# Save locally
vectorstore.save_local("faiss_index")
Astra DB Setup Example:
from langchain.vectorstores import AstraDB

# Initialize with your credentials
vectorstore = AstraDB(
    token="your_token",
    api_endpoint="your_endpoint",
    collection_name="your_collection"
)

# Add documents
vectorstore.add_documents(documents=text_chunks)
Building the Chat Interface
Create the user-facing side of your RAG application by setting up the chat input, memory, and output components.
1
1
Chat Input
Captures user queries.
2
2
Chat Memory
Tracks conversation history.
3
3
Chat Output
Displays responses.
Implementation Examples
# Chat Input Component
@app.route("/chat", methods=["POST"])
def chat_input():
    user_message = request.json.get("message")
    return process_message(user_message)

# Chat Memory Implementation
class ConversationMemory:
    def __init__(self):
        self.messages = []
    
    def add_message(self, role, content):
        self.messages.append({"role": role, "content": content})
        
    def get_history(self):
        return self.messages

# Chat Output Handler
def display_response(response):
    return jsonify({
        "message": response.content,
        "timestamp": datetime.now().isoformat()
    })
These code examples show a basic Flask-based implementation with a chat endpoint, a memory class for tracking conversation history, and a response handler for formatting output.
Integrating the RAG Pipeline: Workflow Overview
Connect the retrieval and generation parts into a unified pipeline. Here's a simplified diagram of the workflow:
[Chat Input] → [Query Processing] → [Vector DB Search] → [Context Assembly] → [Chat Model] → [Chat Output]
User query submission is followed by retrieval, context assembly, language generation, and response display, creating a seamless user experience. Here's how to implement each step:
# Initialize components
from langchain import LLMChain, VectorStore, PromptTemplate
from langchain.chat_models import ChatOpenAI

# Query Processing
def process_query(user_input: str) -> str:
    return user_input.strip()

# Vector DB Search
def retrieve_context(query: str, vector_db: VectorStore) -> list:
    return vector_db.similarity_search(query, k=3)

# Context Assembly
def assemble_context(relevant_docs: list) -> str:
    return "\n".join([doc.page_content for doc in relevant_docs])

# Chat Model Integration
llm = ChatOpenAI(temperature=0.7)
prompt = PromptTemplate(
    template="Context: {context}\nQuestion: {question}\nAnswer:",
    input_variables=["context", "question"]
)
chain = LLMChain(llm=llm, prompt=prompt)

# Complete Pipeline
def rag_pipeline(user_query: str) -> str:
    processed_query = process_query(user_query)
    relevant_docs = retrieve_context(processed_query, vector_db)
    context = assemble_context(relevant_docs)
    response = chain.run(context=context, question=processed_query)
    return response
Designing Prompt Templates
A well-crafted prompt is essential for guiding the language model. Structure your template with placeholders for context and user queries.
Example Prompt Templates:
# Basic Q&A Template
template = """
You are an AI assistant. Use the context below to answer the question.
Context: {context}
Question: {question}
Answer: """

# Advanced Template with System Message
template_json = {
    "system": "You are a helpful AI assistant that answers questions based on provided context.",
    "messages": [
        {"role": "system", "content": "Use this context: {context}"},
        {"role": "user", "content": "{question}"}
    ]
}

# Python Implementation Example
from string import Template

class PromptTemplate:
    def __init__(self, template_text):
        self.template = Template(template_text)
    
    def format(self, context, question):
        return self.template.substitute(
            context=context,
            question=question
        )
Customize the tone and integrate the template with the language generation component for dynamic context and query processing. The templates can be adapted based on your specific use case and the desired interaction style.
OpenAI Integration and API Keys
To generate responses, integrate OpenAI's language models into your flow. Obtain an API key and configure the OpenAI component securely.
1. Obtain and Set API Key
Sign up at OpenAI to create an API key, then set it as an environment variable:
import os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Or load from .env file
from dotenv import load_dotenv
load_dotenv()
2. Configure OpenAI Component
Initialize the OpenAI client in your application:
from openai import OpenAI

client = OpenAI()  # Automatically uses OPENAI_API_KEY from env
# Or specify directly:
client = OpenAI(api_key="your-api-key-here")
3. Select Model and Parameters
Choose a model and configure parameters for deterministic outputs:
response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # or "gpt-4"
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is RAG?"}
    ],
    temperature=0.7,
    max_tokens=150
)
print(response.choices[0].message.content)
Testing and Debugging Your Application
With your components connected, it's time to test your RAG pipeline. Run the flow and input test queries to check for accurate responses.
1
Run the Flow
Click the "Run" button in LangFlow or use the Python API:
from langflow import load_flow_from_json

# Load your flow
flow = load_flow_from_json("my_flow.json")

# Initialize and run
flow.build()
flow.run()
2
Input Test Queries
Try queries like "What is the main topic of the document?"
response = flow.query({
    "input": "What is the main topic of the document?",
    "chat_history": []
})

print("Response:", response.get("output"))
print("Sources:", response.get("sources", []))
3
Debugging
Check component connections and verify dimensions match.
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)

# Inspect component outputs
embeddings = flow.get_component("embeddings")
vector_store = flow.get_component("vector_store")

# Check dimensions
print(f"Embedding dimension: {len(embeddings.embed_query('test'))}")
print(f"Vector store dimension: {vector_store.embedding_dimension}")
Optimization and Best Practices
To ensure high performance and accuracy, follow these optimization tips. Label components clearly, experiment with different prompt formats, and manage tokens efficiently.
Descriptive Naming: Label components clearly.
from langchain import PromptTemplate, LLMChain

# Good naming practice
document_qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="Context: {context}\nQuestion: {question}\nAnswer:"
)

# Clear chain naming
document_qa_chain = LLMChain(
    llm=llm,
    prompt=document_qa_prompt,
    verbose=True
)
Prompt Tuning: Experiment with different prompt formats.
# Example of different prompt formats
factual_prompt = """
Given the context below, answer the question factually:
Context: {context}
Question: {question}
Factual answer:"""

creative_prompt = """
Based on the context, provide a creative explanation:
Context: {context}
Question: {question}
Creative response:"""
Token Management: Adjust chunk sizes and overlaps.
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Optimize chunk sizes and overlap
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)
Document your iterations to build a knowledge base for future projects.
Advanced Tips and Troubleshooting
For more refined control over your RAG system, consider these advanced techniques. Generate several variations of user queries and experiment with chain types for complex document summarization.
Multi-Query RAG
Generate several variations of the user query to fetch a richer context.
from langchain.retrievers import MultiQueryRetriever
from langchain.llms import OpenAI

# Initialize the retriever
retriever = MultiQueryRetriever.from_llm(
    llm=OpenAI(),
    retriever=vector_store.as_retriever(),
    num_queries=3  # Generate 3 variations
)

# Use the multi-query retriever
docs = retriever.get_relevant_documents(
    "What are the key features of RAG?"
)
Chain Variations
Experiment with chain types such as map_reduce for more complex document summarization.
from langchain.chains import MapReduceDocumentsChain
from langchain.chains.summarize import get_map_reduce_chain

# Initialize the map-reduce chain
chain = get_map_reduce_chain(
    llm=OpenAI(),
    token_max=1000,
    reduce_llm_kwargs={"temperature": 0}
)

# Process documents
summary = chain.run(documents=docs)
Deployment Options
Once your RAG application is working as expected, consider your deployment options. Choose from local deployment, cloud platforms, containerization, or Kubernetes clusters.
Local Deployment
Continue running LangFlow on your local machine for testing.
Cloud Deployment
Use platforms like Hugging Face Spaces, Google Cloud, or AWS.
Local Deployment Example:
# Start LangFlow locally
langflow run --host 0.0.0.0 --port 7860

# Or using Docker
docker run -p 7860:7860 logspace/langflow
Cloud Deployment Example (AWS):
# Deploy to AWS ECS
aws ecs create-cluster --cluster-name langflow-cluster

# Create task definition
aws ecs register-task-definition \
  --family langflow \
  --container-definitions '[{
    "name": "langflow",
    "image": "logspace/langflow:latest",
    "portMappings": [{
      "containerPort": 7860,
      "hostPort": 7860
    }]
  }]'

# Run service
aws ecs create-service \
  --cluster langflow-cluster \
  --service-name langflow-service \
  --task-definition langflow:1 \
  --desired-count 1
Conclusion and Next Steps
Congratulations—you've built a RAG application using LangFlow! Here are some code examples to help you expand your implementation:
1. Customize Your Document Loading
from langchain.document_loaders import DirectoryLoader

# Load multiple PDF files from a directory
loader = DirectoryLoader('./documents/', glob="**/*.pdf")
documents = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
splits = text_splitter.split_documents(documents)
2. Experiment with Different Embeddings
from langchain.embeddings import HuggingFaceEmbeddings

# Try different embedding models
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)

# Create and query vector store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings
)
3. Refine Your Prompts
# Enhanced prompt template
template = """
Use the following pieces of context to answer the question. 
If you don't know the answer, just say "I don't have enough information."

Context: {context}
Question: {question}
Answer: Let's think about this step by step:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)
Keep iterating and experimenting with these examples, and soon you'll be ready to build even more sophisticated AI solutions. Happy coding!
Made with