This guide is designed for educational purposes to help you understand RAG concepts and how they work in LarAgent and AI development. The prompts, configurations, and implementations provided here are not fine-tuned or extensively tested for production use.Use this guide to learn and experiment, then build upon it with production-grade practices.
Vector-based RAG (Retrieval-Augmented Generation), often referred to as
Traditional RAG, is a powerful technique that stores text documents as
vectors: the same mathematical representation that LLMs use for understanding
words and concepts - and compares user queries against these vectors to
retrieve relevant context.
How Vector-Based RAG Works
The process flow:- User asks a question
- The question is converted into a vector embedding
- The vector database finds documents with similar embeddings
- Retrieved documents are added as context
- The LLM generates a response based on the context
- User receives an accurate, documentation-based answer
Prerequisites
Before starting this guide, make sure you have:LarAgent Installed
LarAgent Installed
You should have LarAgent installed and configured. If not, check the Quickstart guide.
Vector Search Service
Vector Search Service
Embeddings Generator
Embeddings Generator
We recommend using
openai-php/client
since LarAgent already provides it as a dependency, so you won’t need to install anything extra.However, you can use any embeddings generator, including open-source models running locally.Just make sure you use the same generator for user queries as you use for generating the documents vector representationMake sure you have your vector database running and accessible, and that you
have API keys configured for your chosen embeddings provider.
Implementation Steps
Step 1: Create Your Agent
First, create a new agent using the artisan command:app/AiAgents/SupportAgent.php
.
Step 2: Define Instructions with Blade Template
Create a blade template for your agent’s instructions. This makes it easy to maintain and allows for dynamic content. Create a new file atresources/views/prompts/support_agent_instructions.blade.php
:
resources/views/prompts/support_agent_instructions.blade.php
SupportAgent.php
to use this template:
app/AiAgents/SupportAgent.php
Step 3: Create a Search Service
Create a service to handle vector search operations. We’ll useQdrantSearchService
as an example which has following API:
app/Services/QdrantSearchService.php
For vector databases (including Pinecone or Qdrant), the search logic will be
similar but with different client implementations: The key is to generate
embeddings on each new document added and perform similarity search or them
when agent gets a question.
Step 4: Configure Environment Variables
Add your vector database and OpenAI credentials to.env
:
.env
Step 5: Implement RAG in the Prompt Method
Now, integrate the search service into your agent’sprompt
method. Create a context template first:
resources/views/prompts/support_agent_context.blade.php
SupportAgent.php
to use RAG:
app/AiAgents/SupportAgent.php
The
DeveloperMessage
role is perfect for RAG context because it can be
inserted at any point in the chat history sequence without disrupting the
conversation flow between user and assistant messages.Testing Your RAG Implementation
Interactive Testing
Test your agent using the built-in chat command:Programmatic Testing
You can also test programmatically in your application:Debugging Tips
1
Check Vector Search Results
Add logging to see what documents are being retrieved:
php $documents = $searchService->search($message, limit: 3); \Log::info('Retrieved documents:', $documents);
2
Verify Embeddings
Ensure your embeddings are being generated correctly and match the dimensions
expected by your vector database.
3
Monitor Token Usage
Keep an eye on token consumption, especially when adding multiple documents
as context.
protected $contextWindowSize = 4000; // Adjust based on your needs
Next Steps
Add Guardrails
Implement safeguards to prevent hallucination and keep conversations
on-topic
Explore Other RAG Types
Learn about advanced RAG techniques like retrieval-as-tool or hybrid search
Optimize Performance
Fine-tune your vector search parameters and caching strategies
Monitor Quality
Track answer quality and user satisfaction metrics
Adding Guardrails
To prevent hallucinations and off-topic questions, consider:- Score Thresholding: Only use documents with similarity scores above a threshold:
- Explicit Instructions: Make your system prompt very clear about staying on topic:
- Content Filtering: Pre-filter your document collection to only include appropriate content.
Exploring Other RAG Approaches
Now that you’ve mastered vector-based RAG, consider exploring:- Hybrid Search: Combining vector similarity with keyword search for better accuracy
- Re-ranking: Using a second model to re-rank retrieved documents
- Retrieval-as-Tool: Letting the agent decide when to retrieve information
- Multi-modal RAG: Including images and other media in your knowledge base
For more information about RAG fundamentals in LarAgent, check the RAG Core
Concept documentation.
Summary
You’ve now implemented a fully functional vector-based RAG system with LarAgent! Your support agent can:- ✅ Retrieve relevant documentation based on user queries
- ✅ Provide accurate, context-aware responses
- ✅ Maintain conversation history
- ✅ Gracefully handle questions without available context