Add support for gpt-oss model

Open AlexCheema opened this issue 2 months ago • 0 comments

Motivation

Add support for the GPT-OSS model architecture in the MLX engine. This model requires a custom chat template that isn't included in the model repository, so we bundle it with exo.

Also adds support for DeepSeek-V32's custom encoding module.

Changes

Add gpt_oss_template.jinja chat template in src/exo/shared/models/
Add add_missing_chat_templates() function in utils_mlx.py to inject chat templates for models that don't include them:
- GPT-OSS: Loads bundled Jinja template
- DeepSeek-V32: Dynamically imports the model's custom encoding_dsv32.py module
Add sync versions of model path utilities (resolve_model_path_for_repo_sync, ensure_models_dir_sync) in download_utils.py

Why It Works

Some models (like GPT-OSS) don't ship with chat templates in their HuggingFace repos. By detecting the model type after loading and injecting the appropriate template, we ensure the tokenizer can properly format chat messages for inference.

The GPT-OSS template is bundled with exo and loaded at runtime by walking up from the module location to find shared/models/gpt_oss_template.jinja. This approach works across source installs, pip installs, and PyInstaller bundles.

Test Plan

Manual Testing

Tested loading GPT-OSS model and verifying chat template is applied

Automated Testing

Existing type checking (basedpyright) passes
Existing lint checks (ruff) pass
nix flake check passes

🤖 Generated with Claude Code

Jan 13 '26 00:01 AlexCheema