Context Window

The larger the window, the more context the model holds and the less you need to manually split your documents.

context window, context length, token window

Definition

The maximum amount of information, expressed in tokens, that an AI model can take into account at one time when generating a response.

What is it?

The context window is a language model's working space: everything the model can 'see' at a given moment when forming a response. That includes system instructions, conversation history, any documents you provide, and the question itself. Once the content exceeds the window, the oldest material drops away.

Context windows are measured in tokens, where a token corresponds roughly to three to four characters. Modern models such as GPT-4o or Gemini 1.5 Pro offer windows of hundreds of thousands of tokens, enough for a full contract or client file. Older or smaller models work with a fraction of that, which determines how much you can process in a single session.

Why it matters for SMEs

For SMEs, the size of the context window has a direct practical effect. Asking an AI to summarise a long contract, search a complete file, or assess a lengthy email thread quickly hits a limit if the window is too small. At that point you end up cutting and pasting manually, which undoes part of the benefit of automation.

  • More context produces better answers. The model does not have to guess at earlier details if they are still in the window, which makes outputs more consistent and accurate.
  • Less manual segmenting. A larger window means you can send an entire contract, report, or client history in one go without splitting it yourself.
  • Faster processing of complex files. Accountants, estate agents, and recruiters who work with large client dossiers benefit directly from models with a wider window.

The practical lesson: match the context window size to the volume of your documents. Anyone working with long or complex material should choose a model that can handle that load.

How it works

The context window works like a sliding pane over the conversation. Everything you put in, including instructions, history, and attached files, is converted to tokens. As long as the total stays below the limit, the model sees the whole picture. Once you cross the boundary, the oldest material is dropped automatically.

  1. Every piece of text, instruction, or document is split into tokens.
  2. The model counts all tokens up to the maximum limit of the chosen model.
  3. When generating the response, the model only sees the tokens that fit within the window.
  4. Older tokens that exceed the limit are silently cut off unless you start the conversation fresh.
  5. In RAG applications, only the most relevant fragments are loaded into the window to conserve space.

That cut-off is why long conversations can become inconsistent over time. To avoid it, summarise periodically or use a model with a larger window.

Example in practice

Picture an accounting firm that wants an AI assistant to answer questions about a full client file containing VAT returns, annual accounts, and correspondence from the past year. That file spans dozens of pages. With a small context window the assistant can only process part of the documents at once and misses connections between early and later material. With a larger window the complete file fits in one go, so the assistant can draw links across the whole year and respond without extra manual intervention.

Comparison and misconceptions

The context window determines how much text the model sees right now; an AI agent's memory determines what the model retains between sessions. The window is temporary and resets with each new session, while agent memory is deliberately saved and reloaded.

Frequently asked questions

What is a context window and why is it limited?
The context window is the amount of text an AI model can process at once: the current question, earlier messages, and any documents passed along. It is limited because the model holds all of that text in active memory. The larger the window, the more computing power required.
What happens when you exceed the context window?
Older information falls outside the model's reach. It forgets what was said earlier in the conversation, or the start of a long document. In practice you solve this with summarization, RAG, or context injection: you pass the model only the most relevant pieces.
How large does a context window need to be for business use?
That depends on the task. For short emails or simple questions a small window is enough. For tasks where the model needs to read a long contract, report, or conversation in one go, you need more space. Models like GPT-4o and Claude now support windows of 128,000 to over a million tokens.
From insight to impact

Curious what AI
can do for your processes?

In a free intro call we look at where AI saves you the most time, and what a connected setup looks like.