Back to blog
Solution Exploration

Understanding RAG in a world of infinite context

Charlie Cowan

Charlie Cowan

April 8, 2025

Understanding RAG in a world of infinite context

This week saw two new products launched that help companies build Retrieval Augmented Generation (RAG) pipelines:

  1. Supabase launched Automatic embeddings in Postgres
  2. Cloudflare launched AutoRAG
Blog image

And at the same time, new models are being released with significantly larger context windows:

  1. Meta launched Llama 4 with a 10 million token context window
  2. Google launched Gemini 2.5 with a 1 million token window, rising to 2 million
Blog image

In this article I'll quickly explain what RAG means in the non-technical terms, before looking at what these developments mean in the context of Enterprise AI use cases, both internal for your own employees, and for your customer, partner and supplier use cases.

Prefer to watch instead of read?

Understanding RAG in simple terms

Imagine you are at a dinner party and having a discussion about holidays and the lovely city of Venice.

Most people can only speak about what they personally remember learning about Venice at school or on a visit there.

But one friend is different. They studied History of Art and have an extensive library of books about the city.

Whenever a specific question comes up that nobody knows the answer to, she briefly excuses herself, checks her collection of books, and returns with the relevant information.

This is essentially what RAG (Retrieval Augmented Generation) is:

  1. Retrieval (go and get some pre-existing knowledge)
  2. Augmented (add that to the question and previous conversation)
  3. Generation (provide the answer)

In current RAG systems:

  1. The AI has its own "general knowledge" (like everyone at the dinner party)
  2. When asked a specific question, it can pause, search through external document (like your friend checking her books), and incorporate that information into its answer
  3. This helps the AI provide more accurate, up to date, and specific information than it could from memory alone

This is especially helpful in a business context when we might be asking about a customer's usage of our products, or an outstanding invoice for a supplier.

How do context windows affect the conversation

One of the initial drivers for RAG was that the LLMs themselves could not keep a lot of memory about the current conversation in their 'memory':

Small context windows (previous couple of years technology)

  • Like a friend who can only remember the last 5 minutes of conversation
  • The AI needs to frequently "check its notes" for almost any detailed discussion
  • It might forget earlier parts of your conversation, requiring repetition
  • Information retrieval has to be very targeted and efficient

Growing context windows (today)

  1. Like a friend who can remember the entire evening's conversation
  2. The AI can maintain a much richer ongoing discussion
  3. It still needs to check sources for specialised information
  4. But it doesn't forget what you discussed an hour ago

A good example of this is a Claude Project, where the project knowledge is incorporated into all of the project's chats.

Blog image

"Infinite" Context windows (future possibility)

  • Like a friend with perfect recall of your entire relationship
  • The AI remembers everything you've ever discussed
  • It can connect ideas across months or years of conversations
  • External information is still needed for new topics or updated information

Will RAG still matter with infinite context?

The question here is - if newer models can accept and maintain infinite context for every conversation, why don't we just provide all of that context and allow the LLM to use it when generating an answer.

Let's go back to our dinner party example.

Imagine if your well-read friend, instead of briefly checking a specific book for a fact about Venice, had dragged along her entire History of Art book collection including French renaissance, cubism and impressionism - only to be involved in a discussion about the price of the gondola tours.

  1. It would be overwhelming (too much information)
  2. Slow (imagine waiting for her to drag all those books along)
  3. Wasteful (she's brought many books that she didn't need)
  4. Disruptive to the natural flow of conversation

RAG is like having a skilled librarian at your service.

Even in a world of infinite context this librarian:

  1. Listens carefully to your specific question
  2. Quickly retrieves just the relevant books or pages
  3. Presents only what's needed to answer the question
  4. Returns the books to the shelves when they are no longer needed

In AI terms, this means:

  1. Fewer tokens used - you are only paying to pass the information the model needs
  2. Faster responses - the AI doesn't waste time processing or searching information it doesn't need
  3. More focused answers - without the distraction of unrelated information
  4. Better conversation flow - the AI stays on topic without getting bogged down

I recommend that you should still be continuing to evaluate how RAG is part of your AI architecture.

Even as context windows grow, your AI models will still need access to relevant information at the point of use:

  1. Fresh Information: No matter how good the AI's memory becomes, it still needs access to new information that wasn't available when it was trained or last used
  2. Specialised Knowledge: Just as no human can know everything, even advanced AIs need to consult specialised sources for certain topics, especially relating to customer specific data
  3. Authority and Verification: RAG allows the AI to cite specific sources, making answers more trustworthy and verifiable ("as per your invoice with reference INV-07004")
  4. Personalisation: As your conversation history grows, the AI can retrieve from the user's history to personalise responses based on their specific needs and preferences

The Evolution of RAG

As context windows expand, RAG won't disappear, but it will evolve:

  1. From frequently checking small bits of information to more selective, deeper research
  2. From managing short-term memory limitations to enriching long-term relationships better than a human could
  3. From retrieving basic facts to merging complex insights across multiple sources and functions

Imagine a customer asks a question about renewing their contract, and RAG is able to reference not just their contract, but their billing history and debtor days, their user adoption, and the integrations they have implemented.

RAG is able to pull this context into the discussion in a way that a human contracts manager would find very difficult, or time consuming to do.

Further learning:

If you are new to the concept of RAG try these:

Claude Projects, ChatGPT Projects or GPTs: These all act as a 'wrapper' around a set of similar chats and allow you to add project knowledge. They are a good way of learning how the AI accesses this knowledge and how writing new content specifically for the AI can improve the quality of your responses.

Glean is an enterprise AI platform that connects into all of your enterprise applications from Salesforce, to Workday, to Google. Its a great example of an internal AI RAG use case, where when the user asks a question Glean searches for the relevant context across one or more systems to provide the right answer.

Blog image

And of course, to learn more about the problem that RAG could solve in your own organisation, Kowalah is here to help. Head to Solution Exploration and ask Kowalah to help you explore potential solutions to your business challenge.

Blog image

How can Kowalah help?

CIOs and IT leaders trust Kowalah's AI-powered platform to navigate complex AI procurement decisions with confidence, turning the fear of making costly mistakes into strategic advantage.

Chat with Kowalah to think through your AI strategy, develop your business case and pick the right vendors.

Create best practice documents, processes and policies to put your AI strategy on track.

Sign up for free at kowalah.com/sign-up

Blog image