Inference

What is it?

Inference is what an AI model does the moment you use it. After the training phase, in which the model has learned, comes the inference phase: the model receives new data and generates output based on what it has learned.

Every time you ask a question to ChatGPT, have a document summarised, or let an AI agent execute a task, inference takes place. Inference is the active use of the model, as opposed to training, which happens once or periodically.

Why it matters for SMEs

For SMEs, inference is the phase that is directly visible in cost and speed. The more efficiently a model runs inference, the faster and cheaper your AI applications operate.

Every API call to a language model is an inference call: the cost per use, the latency, and the scalability of your AI solution are directly tied to how inference is structured.
The choice between models is partly about inference cost: a smaller model that infers quickly and cheaply can offer better economics for routine tasks than a large model.
At high volume, such as processing thousands of documents, inference speed determines whether a process is practically feasible or not.

Understanding what inference is helps when comparing AI services on price and speed, and when building scalable workflows.

How it works

During inference, the model processes the input through its learned parameters and generates output step by step. For language models, this means predicting the most likely text token by token. This process runs on the provider's servers or, for smaller models, locally.

Receive input: the prompt, document, or data is passed to the model.
Processing via parameters: the model processes the input through its layers of learned weights.
Token predictions: for language models, the model generates the answer token by token.
Return output: the result is sent back to the calling application.
Cost and latency: the size of the model and the number of tokens determine how fast and expensive the inference is.

Inference is essentially stateless: each request is handled independently. Memory and context for longer conversations are managed externally, not inside the model itself.

Example in practice

Picture a staffing agency processing hundreds of CVs each day through an AI system that automatically highlights relevant experience and skills. Each time the system processes a CV, the model runs inference: it reads the text, applies its learned knowledge, and generates a structured summary. At one hundred CVs per day, that is one hundred inference calls; at one thousand, it is ten times the cost and ten times the processing time, unless the system is built to handle that volume.

Comparison and misconceptions

Training is the learning process in which the model sets its parameters based on data: it happens once or periodically and requires significant compute. Inference is the use of the trained model on new data: it happens with every call and is considerably cheaper and faster than training.

Frequently asked questions

What is inference in the context of AI?

Inference is the moment when a trained AI model generates a prediction or answer based on new input. It is the opposite of training: instead of learning from data, the model applies its knowledge. Every time you ask ChatGPT a question or an AI step processes a document, that is inference.

Is inference expensive in terms of money and computing power?

That depends on the model and the volume of requests. Small, optimized models are cheap and fast. Larger models like GPT-4o cost more per request. With API usage you pay per token; at high volumes smaller models or local inference pay off. For most SME applications the costs are easy to manage.

What is the difference between inference and training?

Training is the process where a model learns patterns from large amounts of data; it happens once or periodically and is computationally expensive. Inference is applying that learned model to new input; it happens quickly and repeatedly with every use. When you use an AI tool, you are always doing inference, never training.

What is it?

Why it matters for SMEs

How it works

Example in practice

Comparison and misconceptions

Frequently asked questions

Curious what AI
can do for your processes?

Stay up to date with the latest news
and developments in Agentic AI

Inference

What is it?

Why it matters for SMEs

How it works

Example in practice

Comparison and misconceptions

Frequently asked questions

Explore related terms

Curious what AI can do for your processes?

Stay up to date with the latest news and developments in Agentic AI

Curious what AI
can do for your processes?

Stay up to date with the latest news
and developments in Agentic AI