Vector-Based RAG - Official LarAgent Documentation

This guide is designed for educational purposes to help you understand RAG concepts and how they work in LarAgent and AI development. The prompts, configurations, and implementations provided here are not fine-tuned or extensively tested for production use.Use this guide to learn and experiment, then build upon it with production-grade practices.

Vector-based RAG (Retrieval-Augmented Generation), often referred to as Traditional RAG, is a powerful technique that stores text documents as vectors: the same mathematical representation that LLMs use for understanding words and concepts - and compares user queries against these vectors to retrieve relevant context.

In this guide, we’ll implement a customer support agent that answers questions based on FAQ documents stored in a vector database. The agent will intelligently retrieve relevant documentation and use it to provide accurate, context-aware responses.

How Vector-Based RAG Works

The process flow:

User asks a question
The question is converted into a vector embedding
The vector database finds documents with similar embeddings
Retrieved documents are added as context
The LLM generates a response based on the context
User receives an accurate, documentation-based answer

Prerequisites

Before starting this guide, make sure you have:

LarAgent Installed

You should have LarAgent installed and configured. If not, check the Quickstart guide.

composer require maestroerror/laragent

Vector Search Service

You can choose any vector database solution, but if you’re not sure what to pick, these three are recommended:

Qdrant
Pinecone
pgvector
- If your project uses PostgreSQL, the pgvector extension is also a good option

Embeddings Generator

We recommend using openai-php/client since LarAgent already provides it as a dependency, so you won’t need to install anything extra.However, you can use any embeddings generator, including open-source models running locally.Just make sure you use the same generator for user queries as you use for generating the documents vector representation

Make sure you have your vector database running and accessible, and that you have API keys configured for your chosen embeddings provider.

Implementation Steps

Step 1: Create Your Agent

First, create a new agent using the artisan command:

php artisan make:agent SupportAgent

This will generate a new agent class at app/AiAgents/SupportAgent.php.

Step 2: Define Instructions with Blade Template

Create a blade template for your agent’s instructions. This makes it easy to maintain and allows for dynamic content. Create a new file at resources/views/prompts/support_agent_instructions.blade.php:

resources/views/prompts/support_agent_instructions.blade.php

# Purpose

You are a customer support agent of SaaS platform.
Your role is to assist users with their questions and issues related to the platform.

Answer questions based on the following Context provided to you.
If the Context does not contain information relevant to the user's question, respond with:
"I'm sorry, I don't have that information right now. Please contact our support team at [email protected] for further assistance."

**Important Guidelines:**
- Only answer based on the provided Context
- Be helpful, friendly, and professional
- If you're unsure, avoid guessing, ask to contact the support team
- Provide clear, concise answers

Current Date: {{ $date }}

## Current User

Name: {{ $user->name ?? 'Valued Customer' }}
Email: {{ $user->email ?? 'N/A' }}
Account Type: {{ $user->subscription_type ?? 'Free' }}

Now, update your SupportAgent.php to use this template:

app/AiAgents/SupportAgent.php

<?php

namespace App\AiAgents;

use LarAgent\Agent;

class SupportAgent extends Agent
{
    protected $model = 'gpt-4o';
    protected $history = 'cache';
    protected $temperature = 0.6; // Lower temperature for more consistent, factual responses
    protected $maxCompletionTokens = 500;

    public function instructions()
    {
        return view('prompts.support_agent_instructions', [
            'date' => now()->format('F j, Y'),
            'user' => auth()->user(),
        ])->render();
    }

    public function prompt($message)
    {
        // We'll add RAG logic here in the next steps
        return $message;
    }
}

Step 3: Create a Search Service

Create a service to handle vector search operations. We’ll use QdrantSearchService as an example which has following API:

app/Services/QdrantSearchService.php

namespace App\Services;

class QdrantSearchService
{
    public function convertToEmbeddings(string $text): array;
    public function search(string $query, int $limit = 10): array;
    public function importDocumentToCollection(array $document, string $collectionName): bool;
}

For vector databases (including Pinecone or Qdrant), the search logic will be similar but with different client implementations: The key is to generate embeddings on each new document added and perform similarity search or them when agent gets a question.

Step 4: Configure Environment Variables

Add your vector database and OpenAI credentials to .env:

.env

OPENAI_API_KEY=your_openai_api_key_here

# Qdrant Configuration
QDRANT_HOST=http://localhost:6333
QDRANT_API_KEY=your_qdrant_api_key

Step 5: Implement RAG in the Prompt Method

Now, integrate the search service into your agent’s prompt method. Create a context template first:

resources/views/prompts/support_agent_context.blade.php

# Context

The following documents from our knowledge base may be relevant to the user's question:

@foreach($documents as $index => $doc)
## Document {{ $index + 1 }}: {{ $doc['title'] }}

{{ $doc['content'] }}

---
@endforeach

Use this context to answer the user's question accurately. If the context doesn't contain relevant information, let the user know.

Update your SupportAgent.php to use RAG:

app/AiAgents/SupportAgent.php

<?php

namespace App\AiAgents;

use LarAgent\Agent;
use LarAgent\Messages\DeveloperMessage;
use App\Services\QdrantSearchService;

class SupportAgent extends Agent
{
    protected $model = 'gpt-4o-mini';
    protected $history = 'cache';
    protected $temperature = 0.3;
    protected $maxCompletionTokens = 500;

    public function instructions()
    {
        return view('prompts.support_agent_instructions', [
            'date' => now()->format('F j, Y'),
            'user' => auth()->user(),
        ])->render();
    }

    public function prompt($message)
    {
        // Search for relevant documents
        $searchService = new QdrantSearchService();
        $documents = $searchService->search($message, limit: 3);

        // Only add context if we found relevant documents
        if (!empty($documents)) {
            // Format the context using blade template
            $context = view('prompts.support_agent_context', [
                'documents' => $documents,
            ])->render();

            // Add context as a developer message
            $devMsg = new DeveloperMessage($context);
            $this->chatHistory()->addMessage($devMsg);
        }

        return $message;
    }
}

The DeveloperMessage role is perfect for RAG context because it can be inserted at any point in the chat history sequence without disrupting the conversation flow between user and assistant messages.

Testing Your RAG Implementation

Interactive Testing

Test your agent using the built-in chat command:

php artisan agent:chat SupportAgent

Try asking questions from your documentation:

"How do I create a new user?"
"Who has invented the lightbulb?"
"What are the pricing plans available?"
"What's the difference between Basic and Pro plans?"

Programmatic Testing

You can also test programmatically in your application:

use App\AiAgents\SupportAgent;

// For authenticated users
$response = SupportAgent::forUser(auth()->user())
    ->respond('How do I create a new email campaign?');

str_contains($response, "test_string");

// For named sessions
$response = SupportAgent::for('test_session')
    ->respond('What are the pricing plans?');

str_contains($response, "test_string");

Debugging Tips

Check Vector Search Results

Add logging to see what documents are being retrieved:

php $documents =             $searchService->search($message, limit: 3); \Log::info('Retrieved             documents:', $documents);

Verify Embeddings

Ensure your embeddings are being generated correctly and match the dimensions expected by your vector database.

Monitor Token Usage

Keep an eye on token consumption, especially when adding multiple documents as context. protected $contextWindowSize = 4000; // Adjust based on your needs

Next Steps

Add Guardrails

Implement safeguards to prevent hallucination and keep conversations on-topic

Explore Other RAG Types

Learn about advanced RAG techniques like retrieval-as-tool or hybrid search

Optimize Performance

Fine-tune your vector search parameters and caching strategies

Monitor Quality

Track answer quality and user satisfaction metrics

Adding Guardrails

To prevent hallucinations and off-topic questions, consider:

Score Thresholding: Only use documents with similarity scores above a threshold:

$documents = array_filter($documents, function($doc) {
    return $doc['score'] > 0.7; // Adjust threshold as needed
});

Explicit Instructions: Make your system prompt very clear about staying on topic:

If the user asks about topics unrelated to HelloWorld or our services,
politely redirect them: "I'm specialized in helping with HelloWorld platform questions.
For other topics, please visit our general contact page."

Content Filtering: Pre-filter your document collection to only include appropriate content.

Exploring Other RAG Approaches

Now that you’ve mastered vector-based RAG, consider exploring:

Hybrid Search: Combining vector similarity with keyword search for better accuracy
Re-ranking: Using a second model to re-rank retrieved documents
Retrieval-as-Tool: Letting the agent decide when to retrieve information
Multi-modal RAG: Including images and other media in your knowledge base

For more information about RAG fundamentals in LarAgent, check the RAG Core Concept documentation.

Summary

You’ve now implemented a fully functional vector-based RAG system with LarAgent! Your support agent can:

✅ Retrieve relevant documentation based on user queries
✅ Provide accurate, context-aware responses
✅ Maintain conversation history
✅ Gracefully handle questions without available context

This foundation can be extended and customized for various use cases beyond customer support, such as internal knowledge bases, educational assistants, or technical documentation helpers.

General

RAG

​How Vector-Based RAG Works

​Prerequisites

​Implementation Steps

​Step 1: Create Your Agent

​Step 2: Define Instructions with Blade Template

​Step 3: Create a Search Service

​Step 4: Configure Environment Variables

​Step 5: Implement RAG in the Prompt Method

​Testing Your RAG Implementation

​Interactive Testing

​Programmatic Testing

​Debugging Tips

​Next Steps

Add Guardrails

Explore Other RAG Types

Optimize Performance

Monitor Quality

​Adding Guardrails

​Exploring Other RAG Approaches

​Summary

How Vector-Based RAG Works

Prerequisites

Implementation Steps

Step 1: Create Your Agent

Step 2: Define Instructions with Blade Template

Step 3: Create a Search Service

Step 4: Configure Environment Variables

Step 5: Implement RAG in the Prompt Method

Testing Your RAG Implementation

Interactive Testing

Programmatic Testing

Debugging Tips

Next Steps

Adding Guardrails

Exploring Other RAG Approaches

Summary