Token

The unit of measurement that determines what an AI model can process in one go and what it costs.

Token, text unit

Definition

A token is the smallest unit of text that an AI model processes: typically a fragment of a word, a whole word, or a punctuation mark. AI models read, generate, and are billed by tokens.

What is it?

A token is the smallest unit of text that an AI model processes. In most English-language models, a token averages around four characters or three-quarters of a word. A ten-word sentence typically consists of thirteen to fifteen tokens.

Tokens are the basic unit of everything a language model does: every word you type is split into tokens, and every word the model produces is a token. The limit of the context window, the cost of API usage, and the speed of a model are all expressed in tokens.

Why it matters for SMEs

For an SME owner using AI tools, tokens determine two practical things: how much fits into a conversation and what it costs.

  • The context window of a model, its maximum working memory per session, is capped in tokens. Long documents, extended email threads, or large datasets need to be split or summarised to stay within that limit.
  • API usage for AI services is billed per token: the more text in and out, the higher the cost. Concise, well-targeted instructions are therefore also economically sensible.
  • Speed and capacity planning depend on token volume: an agent processing thousands of invoices daily generates a high token volume that needs to be accounted for in the architecture.

You do not need to count tokens to work effectively with AI, but the concept helps explain why a model cannot process a long document in one pass, or why a particular task costs more than expected.

How it works

Language models work not with letters or words but with tokens, which are determined by a tokeniser: an algorithm that splits text into pieces the model recognises from its training data.

  1. The text you enter is split by the tokeniser into tokens, each a chunk the model treats as a single unit.
  2. Each token is assigned a numerical ID that the model uses when processing the input.
  3. The model generates output token by token, with each newly generated token influencing the next.
  4. The generated token IDs are translated back into readable text by the tokeniser.

Because the tokeniser differs by model, the same piece of text can contain a different number of tokens in GPT-4 than in Claude or Gemini. That matters when comparing budgets or limits across different AI services.

Example in practice

Picture an accounting practice processing dozens of purchase invoices monthly via an AI agent. The agent reads each invoice, extracts the supplier, amount, and due date, and posts that data to the accounting package. Each A4 invoice contains roughly 400 to 600 tokens. At fifty invoices a day, the agent processes tens of thousands of tokens daily. That volume is useful to know when choosing an AI service and estimating monthly costs.

Comparison and misconceptions

A token is not the same as a word, but that is the fastest mental model for everyday use. More precisely, a word averages around one and a half tokens in English text. Tokens are the unit of measurement for AI models; words are the unit of measurement for people.

Frequently asked questions

What is a token in the context of AI?
A token is the smallest unit of text a language model processes: on average four characters or three-quarters of a word. 'Hello world' is two tokens. Tokens matter because API usage is priced per token and the context window is measured in tokens.
How many tokens are enough in practice?
For a simple question and answer: a few hundred tokens. For a long document or extensive system instruction: tens of thousands. Modern models support context windows of 128,000 tokens and more, which is roughly one hundred pages of text.
How do you reduce token usage?
Keep system prompts concise but complete. Remove unnecessary context. Use a smaller model for simple tasks. Consider summarizing when input is too long. At high volumes, token optimization makes a noticeable difference in cost.
From insight to impact

Curious what AI
can do for your processes?

In a free intro call we look at where AI saves you the most time, and what a connected setup looks like.