Add support for gpt-oss model
Motivation
Add support for the GPT-OSS model architecture in the MLX engine. This model requires a custom chat template that isn't included in the model repository, so we bundle it with exo.
Also adds support for DeepSeek-V32's custom encoding module.
Changes
- Add
gpt_oss_template.jinjachat template insrc/exo/shared/models/ - Add
add_missing_chat_templates()function inutils_mlx.pyto inject chat templates for models that don't include them:- GPT-OSS: Loads bundled Jinja template
- DeepSeek-V32: Dynamically imports the model's custom
encoding_dsv32.pymodule
- Add sync versions of model path utilities (
resolve_model_path_for_repo_sync,ensure_models_dir_sync) indownload_utils.py
Why It Works
Some models (like GPT-OSS) don't ship with chat templates in their HuggingFace repos. By detecting the model type after loading and injecting the appropriate template, we ensure the tokenizer can properly format chat messages for inference.
The GPT-OSS template is bundled with exo and loaded at runtime by walking up from the module location to find shared/models/gpt_oss_template.jinja. This approach works across source installs, pip installs, and PyInstaller bundles.
Test Plan
Manual Testing
- Tested loading GPT-OSS model and verifying chat template is applied
Automated Testing
- Existing type checking (basedpyright) passes
- Existing lint checks (ruff) pass
-
nix flake checkpasses
🤖 Generated with Claude Code