Core Terminology of GPT-OSS Models | An OpenAI Open-Source AI Glossary

Core Model Concepts

Open-Source Weighted Models

📥

Download

✏️

Modify

Brain

📤

Share

🚀

Use

Refers to models that publicly release their internal parameters (weights). Developers can freely download, modify, and use these models, fostering technological transparency and innovation.

Application in this Document:

gpt-oss-120b and gpt-oss-20b are two open-source weight models. Anyone can access their "brain" parameters.

Core Model Concepts

Inference

Prompt

Model

Answer

The process of using a pre-trained model to process new, unseen data and make predictions or generate content. It's like a student who has finished learning and is now using that knowledge to answer exam questions.

Application in this Document:

The GPT-OSS models are designed to provide powerful inference capabilities, meaning they can efficiently understand and respond to new user requests.

Core Model Concepts

Model Card

Model Manual

⚙️ Architecture...

📊 Performance...

⚠️ Risks...

A document that provides detailed information about a machine learning model, including its architecture, training data, performance evaluation, intended uses, and potential risks and limitations. It's like a product's "instruction manual."

Application in this Document:

This document that you are reading is itself the model card for the GPT-OSS models.

Model Architecture

Mixture of Experts (MoE)

Mixture of Experts

Token →

Router

Expert 1

Expert 2

Expert N

A neural network architecture. Instead of a single large model handling all tasks, it consists of multiple smaller "expert" networks and a "router." The router selects the most appropriate experts to handle an input, which significantly improves efficiency by activating only a fraction of the parameters.

Application in this Document:

The GPT-OSS models use an MoE architecture. For example, the 120b model has 128 experts, and only 4 are activated for each token.

Model Architecture

Quantization

3.14159

FP32

→

3.14

INT8

A technique for model compression. It reduces the model's size and memory footprint by lowering the numerical precision of its parameters (weights). This is like representing a number with fewer digits, such as simplifying 3.1415926 to 3.14, which saves storage space.

Application in this Document:

The MoE weights in the GPT-OSS models are quantized, allowing them to run on consumer-grade GPUs.

Model Architecture

Grouped-Query Attention (GQA)

Grouped-Query Attention

Q → KV

MHA

Q

}

→ KV

GQA

An optimized version of the attention mechanism. In standard multi-head attention (MHA), each "query" head has its own "key" and "value" heads, which is computationally expensive. GQA allows multiple query heads to share a single key/value head, significantly reducing computation and memory requirements while retaining most of the performance.

Application in this Document:

The GPT-OSS models use GQA to improve the efficiency of attention calculations.

Training & Alignment

Fine-tuning

General Model

📄 Small Dataset

Specialized Model

The process of taking a model that has been pre-trained on a large dataset and training it further on a smaller, task-specific dataset. It's like a generalist college graduate receiving specialized job training for a specific position.

Application in this Document:

The document mentions that attackers might fine-tune the model to bypass safety restrictions.

Training & Alignment

Jailbreak

Clever
Prompt 🔑

☠️
Forbidden Content

Refers to the act of designing clever, adversarial prompts to bypass an AI model's safety and content restrictions, causing it to generate content it is not supposed to (e.g., harmful advice).

Application in this Document:

OpenAI evaluated the GPT-OSS models to test their robustness against jailbreaking and found their performance to be comparable to OpenAI 04-mini.

Training & Alignment

Hallucination

Question:

2+2=?

Answer:

5

X

Refers to when a language model generates information that seems plausible but is factually incorrect, unsubstantiated, or irrelevant to the context. It's like the model is "confidently spouting nonsense."

Application in this Document:

Due to their smaller scale, the GPT-OSS models are more prone to hallucination than larger, frontier models.

Features & Applications

Chain-of-Thought (CoT)

Chain-of-Thought

Q

Step 1

Step 2

A

A technique that prompts an AI model to articulate its "thinking" or reasoning steps before providing a final answer. This makes the model's response more transparent and interpretable, and often leads to more accurate results.

Application in this Document:

The GPT-OSS models provide a complete chain-of-thought, but the document warns that these chains may contain hallucinatory content.

Features & Applications

Tool Use

AI

Call

Return

🛠️

Toolbox

Refers to the model's ability to not only generate text but also call external tools (like a code interpreter, search engine, or calculator) to complete tasks. This greatly expands the model's capabilities, allowing it to access real-time information or perform complex calculations.

Application in this Document:

The models are trained to use a browser tool and a Python tool to enhance their problem-solving abilities.

Features & Applications

Benchmark

Model A

Model B

Model C

✅

Standardized Test

🏆

Leaderboard

A standardized set of tests or tasks used to measure and compare the performance of different AI models. It's like using the same practice exam to evaluate the knowledge levels of different students.

Application in this Document:

GPT-OSS was evaluated on several industry-standard benchmarks (like MMLU, SWE-Bench), and its scores are published.

GPT-OSS Core Terminology Explained

Open-Source Weighted Models

Inference

Model Card

Mixture of Experts (MoE)

Quantization

Grouped-Query Attention (GQA)

Fine-tuning

Jailbreak

Hallucination

Chain-of-Thought (CoT)

Tool Use

Benchmark