GPT OSS Model

OpenAI proudly introduces gpt-oss-120b and gpt-oss-20b: two state-of-the-art open-weight models bringing unprecedented reasoning capabilities, efficiency, and flexibility to developers and researchers worldwide.

Built for the Future of AI Development

GPT-OSS is more than just open source; it's a comprehensive commitment to performance, efficiency, and safety.

Permissive Apache 2.0 License

Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.

Configurable Reasoning Effort

Easily adjust reasoning effort (low, medium, high) based on your specific use case and latency needs.

Full Chain of Thought

Get full access to the model's reasoning process, enabling easier debugging and enhanced trust in outputs.

Fine-tunable

Fully customize the model for your specific use cases through parameter-efficient fine-tuning.

Agentic Capabilities

Leverage the model's native ability for function calling, web browsing, Python code execution, and structured outputs.

Native MXFP4 Quantization

The model is trained with native MXFP4 precision, allowing the 120b model to run on a single H100.

Choose Your Model: gpt-oss-120b vs gpt-oss-20b

GPT-OSS offers two models to meet a wide range of needs, from local prototyping to enterprise-scale deployment.

gpt-oss-120b

Flagship performance, built for the most demanding tasks.

  • Ideal for: Complex scientific computing, enterprise-grade agents, and high-quality content creation.
  • Performance: On par with OpenAI's o4-mini on core reasoning benchmarks.
  • Hardware: Runs efficiently on a single 80GB GPU.

gpt-oss-20b

Ultra-efficient, designed for edge computing and rapid iteration.

  • Ideal for: On-device applications, local inference, rapid prototyping, and academic research.
  • Performance: Achieves OpenAI's o3-mini level on general benchmarks.
  • Hardware: Requires only 16GB of memory, compatible with various consumer devices.

Advanced Model Architecture

Based on a Mixture-of-Experts (MoE) Transformer architecture, optimized for efficiency and performance.

Core Components

  • Mixture-of-Experts (MoE): The 120b model has 128 experts and the 20b has 32, with only 4 activated per forward pass for massive efficiency gains.
  • Attention Mechanism: Uses alternating striped window and fully dense patterns, with Grouped-Query Attention (GQA) to optimize memory and inference speed.
  • MXFP4 Quantization: MoE weights are natively quantized to MXFP4, reducing memory footprint by over 90%.
Component 120b 20b
Total Parameters116.8B20.9B
Active Parameters5.1B3.6B
Checkpoint Size60.8GiB12.8GiB

Performance Benchmarks

Across multiple authoritative benchmarks, GPT-OSS demonstrates powerful capabilities comparable to top closed-source models.

Core Reasoning & Knowledge

gpt-oss-120b excels in tests like AIME (math competitions) and MMLU (university-level exams).

Coding & Tool Use

Demonstrates strong agentic potential in tests like Codeforces (programming contests) and Tau-Bench (function calling).

Variable Reasoning Effort

Achieve a smooth trade-off between accuracy and response speed by adjusting reasoning modes (low, medium, high).

Quickstart

Get started easily with GPT-OSS using your favorite tools and libraries.

Transformers

Use the Transformers library for inference, which automatically applies the Harmony format.

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)
messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Ollama

One of the easiest ways to run GPT-OSS locally on consumer hardware.

# gpt-oss-20b
ollama pull gpt-oss:20b
ollama run gpt-oss:20b

# gpt-oss-120b
ollama pull gpt-oss:120b
ollama run gpt-oss:120b

Frequently Asked Questions (FAQ)

What is GPT-OSS?

GPT-OSS refers to two open-weight language models released by OpenAI: gpt-oss-120b and gpt-oss-20b. They are designed for powerful reasoning, agentic tasks, and versatile development, and can be run on your own infrastructure.

What license are the models under?

The GPT-OSS models are released under the permissive Apache 2.0 license, which allows for commercial use without copyleft restrictions or patent risk, making it ideal for experimentation, customization, and commercial deployment.

What hardware is required to run these models?

Thanks to native MXFP4 quantization, gpt-oss-120b can run efficiently on a single 80GB GPU like the NVIDIA H100. The gpt-oss-20b model is even more lightweight, requiring only 16GB of memory, making it suitable for high-end laptops or servers.

What is the Harmony chat format?

Harmony is a chat format designed specifically for the GPT-OSS models. It uses special tokens and keyword arguments to delineate message boundaries and roles (like system, developer, user). Using this format is essential for the models to function correctly and achieve optimal performance.

Can I fine-tune these models?

Yes, the GPT-OSS models fully support fine-tuning. You can customize the models for your specific use cases through Parameter-Efficient Fine-Tuning (PEFT) to achieve the best results.