Top-p (Nucleus Sampling)

A setting that controls how broadly the AI samples its word choices, balancing creativity with coherent output.

Top-p, nucleus sampling, p-sampling

Definition

Top-p, also known as nucleus sampling, is a setting that controls how many options an AI model considers when generating text: the model restricts itself to the most probable options until their cumulative probability reaches the set threshold.

What is it?

Top-p, also known as nucleus sampling, is a setting that guides an AI model's word selection by establishing a dynamic threshold. Rather than using a fixed number of candidate words, the system recalculates for each position which words qualify: all words whose cumulative probability reaches the set value together form the nucleus.

A top-p of 0.9 means the model chooses from the most probable words that together account for 90 per cent of the probability mass. Everything else is excluded, producing a filtered but not overly restricted selection.

Why it matters for SMEs

Top-p is a technical setting that most SME users never adjust manually, but it does influence the tone and variation of AI output. Understanding what it does helps with diagnosing problems and choosing the right pre-configured AI tool.

  • A top-p that is too low makes the model repetitive: it cycles through the same phrasings, which is noticeable in texts that should sound lively, such as client communications or marketing copy.
  • A top-p that is too high allows the model to make broader choices, increasing creativity but also raising the chance of unusual or incoherent sentences.
  • In combination with temperature, both settings together define the character of the output: temperature sharpens or softens the probability distribution, top-p determines how many options are considered at all.

For most business applications, the default settings are sufficient and top-p should only be adjusted when temperature alone does not produce the desired result.

How it works

Top-p works by calculating a threshold for each token position based on the cumulative probability of the most likely tokens.

  1. The model calculates a probability distribution across all possible tokens for the next position.
  2. Tokens are ranked from most probable to least probable.
  3. The model adds up probabilities from the top down until the sum reaches the set top-p value.
  4. All tokens outside that boundary are excluded from selection.
  5. From the remaining tokens, the model picks the next one, with the choice still influenced by the temperature setting.

The result is a dynamic window that is narrower when the most probable option is already dominant, and broader when probabilities are more evenly spread. This keeps output coherent where a choice is obvious, while giving the model more latitude where multiple options are genuinely plausible.

Example in practice

Picture a real estate agent using an AI tool to write property descriptions. With a standard top-p of 0.9, the model produces readable, varied texts without generating incoherent sentences. If the owner lowers top-p to 0.5, the texts become safer but also more repetitive: the model returns to the same descriptive phrases. Raising it to 0.99 can make the texts more creative but occasionally introduces an odd word choice that needs manual correction.

Comparison and misconceptions

Temperature shifts the probabilities of all tokens up or down like a global dial; top-p trims away the least likely options like a selection threshold. Together they define the model's creative range. The recommended starting point: adjust temperature first and leave top-p at its default until that proves insufficient.

Frequently asked questions

What is top-p or nucleus sampling?
Top-p is a method for bounding the randomness in AI output. The model selects only from the subset of tokens that together cover a certain probability (p). Set top-p to 0.9 and the model only considers the tokens that together represent 90% of the probability. This reduces unexpected outcomes without fully limiting creativity.
When do you adjust top-p?
When you experience too unpredictable output but temperature is already at a reasonable value. Top-p is useful when you want creative output but want to avoid the most extreme outcomes. For most business applications the default (0.9 or 1.0) works well; adjusting makes sense when fine-tuning specific generation tasks.
Should you set both temperature and top-p?
No, and it is not recommended to change both at once. Both affect how tokens are selected; adjusting them together makes it hard to determine which has which effect. Pick one as your primary parameter and leave the other at its default.
From insight to impact

Curious what AI
can do for your processes?

In a free intro call we look at where AI saves you the most time, and what a connected setup looks like.