Joshua Amaju - How I built a RAG system

RAG is easy!

I had to build a RAG system for a product I’m working on, and it was surprisingly easier than I thought.

Why build our own RAG system

The initial iteration of the product used the OpenAI SDK. At some point during development, we decided to switch to a more generic solution like the AI SDK by Vercel. This meant we lost access to the vector store and file search tools natively provided by OpenAI.

The AI SDK has experimental support for files and attachments, which converts files to base64 and includes them directly as text alongside the user prompt. This quickly pushed us to our token limit.

This also created an issue where the model often misinterpreted the number of uploaded files. For example, I uploaded three files, asked the model how many files I uploaded, and it said one. This might have something to do with how the AI SDK handles attachments.

The solution

At first, I naively extracted text from the file using await file.text(), only to realize it doesn’t always produce the actual content in the expected format.

So I reached for a PDF parser, which provides more than just the raw text—it also returns coordinates for each text block it extracts (which might come in handy for citations 🤔). I then concatenated each page’s text into a single long string.

I then naively passed this large text blob to a model for embedding—only to hit a token limit error. So I decided to split the text into chunks (just basic word-count chunking for now; I’ll switch to a more context-aware strategy later) and embedded each chunk separately.

The next step was to store each chunk with its embedding. Luckily, we were already using Neo4j, which has native vector search support. I just had to create a vector index. For example:

CREATE VECTOR INDEX attachments IF NOT EXISTS
FOR (a:Attachment)
ON a.embedding
OPTIONS { indexConfig: {
  `vector.dimensions`: 1536,
  `vector.similarity_function`: 'cosine'
}}

Initial RAG stage done ✅. The next step is querying using the vector index:

Receive the user query (can also come from the model using a tool)
Create an embedding of the query
Search the database using that query embedding

Embed the query

const { embedding } = await embed({
    model: openai.embedding("text-embedding-3-small"),
    value: query,
});

Retrieve from the database

CALL db.index.vector.queryNodes('attachments', 5, $query_embedding)
YIELD node AS attachment, score
RETURN attachment.content AS content, score