Reference
Glossary
The vocabulary of modern AI, in plain English. Bookmark this page.
- Token
- The atomic unit an LLM reads and writes — a sub-word chunk produced by a tokenizer. Pricing and context limits are measured in tokens.
- Embedding
- A dense numeric vector that represents the meaning of a piece of text, image, or audio. Similar things have similar vectors.
- Transformer
- The neural network architecture (2017) based on self-attention that powers every modern LLM.
- Attention
- A mechanism that lets each token in a sequence weigh how much every other token matters when computing its representation.
- Context window
- The maximum number of tokens an LLM can consider at once when generating a response.
- Temperature
- A sampling parameter that controls randomness. 0 = deterministic, higher = more creative and varied.
- Top-p / nucleus sampling
- An alternative to temperature: sample from the smallest set of tokens whose cumulative probability exceeds p.
- Hallucination
- When an LLM produces fluent text that is factually wrong or fabricated.
- Fine-tuning
- Continuing to train a pretrained model on a smaller, task-specific dataset to specialize its behavior.
- RLHF
- Reinforcement Learning from Human Feedback — aligning a model with human preferences by training a reward model from comparisons.
- RAG
- Retrieval-Augmented Generation — fetching relevant documents at query time and giving them to the LLM as context.
- Vector database
- A database optimized for similarity search over embeddings (e.g. pgvector, Pinecone, Qdrant, Weaviate).
- Diffusion model
- A generative model that creates images by learning to reverse a gradual noising process, guided by a prompt.
- Multimodal
- A model that handles multiple input/output modalities — text, image, audio, video — in one system.
- Agent
- An LLM running in a loop with access to tools and a goal, capable of planning and taking actions.
- Function calling
- A model capability where the LLM outputs a structured tool invocation (name + JSON arguments) for your app to execute.
- Prompt injection
- An attack where untrusted input contains instructions that override the developer's system prompt.
- Inference
- Running a trained model to produce an output. Cheap per call compared to training.
- Parameter
- A learnable weight inside a neural network. Frontier LLMs have hundreds of billions to trillions of parameters.
- Quantization
- Compressing model weights to lower precision (e.g. 4-bit) to run faster and on smaller hardware with minimal quality loss.