Custom MLX Models Support - Issue #918

Motivation

Fixes Issue #918: Enable users to run custom MLX models from mlx-community on Hugging Face without manual code updates.

What Changed

1. Frontend UI for Custom Models

Commit: "Add custom models to dashboard"

Added "Custom Models" section to downloads page with HuggingFace model ID input
Implemented "Download and Run" button triggering model placement into the Pipeline
Extended store with placeInstance() method for backend API communication

2. Automated Tests

Commit: "Add integration test"

Created src/exo/worker/tests/test_custom_model.py - integration test verifying:
- Custom model placement is triggered correctly
- Model downloads and loads successfully
- Chat inference works with loaded model
Added .github/workflows/test_custom_models.yml - GitHub Action for automated CI testing

3. Persistent Storage & Model Registration

Commit: "Persist storage for custom models"

Fixed resolve_model_meta() to check both short_id keys and full model_id values
Enabled custom model registration to ~/.exo/custom_models.json during download
Models reload automatically on EXO restart from persistent storage

Why It Works

This implementation enables dynamic custom model loading without requiring code modifications for each new model. Users can:

Download any mlx-community model via the dashboard
Have models persist across restarts
Test out the model once it loads

Known Issues

1. Missing `chat_template.jinja` for Some Models

Some mlx-community models don't include a chat template, causing the model to output its Instructions instead of formatted chat responses. This is a model-specific issue with mlx-community models, not a bug in our implementation.

Workaround: Use models that include proper chat templates (e.g., `mlx-community/Qwen2.5-0.5B-Instruct-4bit or add a chat template.jinja yourself.

Testing

Manual Testing

Hardware: MacBook Pro (M4 Pro)
Tested with mlx-community/gpt-oss-20b-MXFP4-Q8
Verified:
- Model appears in downloads list with correct size
- Download progress bar updates in real-time
- Model persists in ~/.exo/custom_models.json
- Model is available after restart
- Chat inference works correctly

Automated Testing

Integration test: src/exo/worker/tests/test_custom_model.py
CI workflow: .github/workflows/test_custom_models.yml
Input validation: Only allows mlx-community models in downloads

Files Modified

src/exo/master/api.py - Model resolution & API response
src/exo/shared/models/model_cards.py - Persistence logic
src/exo/worker/download/impl_shard_downloader.py - Registration on download
dashboard/src/routes/downloads/+page.svelte - Custom models UI
dashboard/src/lib/stores/app.svelte.ts - API integration

Dec 20 '25 13:12 nightguarder

Hi, I have successfully added a new Feature: Testing custom MLX models

Can Someone please clone & run my fork to verify downloading a larger model like mlx-community/gpt-oss-20b-MXFP4-Q8? I don’t have enough RAM :/

Dec 20 '25 15:12 nightguarder

I hope this is something we wanted. Currently only for testing purposes. Screenshot 2025-12-20 at 16 02 34

Dec 20 '25 15:12 nightguarder

Not sure why my VSCode Prettier auto prettified all the files I’ve changed.

I will probably create a new clean PR where I only change the required code blocks, to keep it clean, if it’s needed to Approve this feature request.

Dec 20 '25 15:12 nightguarder

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

Dec 20 '25 15:12 Evanev7

As for prettier, I don't believe our current formatter extends to the dashboard so I don't particularly mind atm

Dec 20 '25 15:12 Evanev7

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

My idea was after the users tests it and verify, then we add it model_cards as official supported model. but yeah, can be skipped.

Dec 20 '25 15:12 nightguarder

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

Dec 20 '25 16:12 Evanev7

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

Yes I see the erorr. this might be more difficult than I thought. Runner 4e13d976-5262-43eb-b513-e9678e673e59 crashed with critical exception Quantized SDPA does not support attention sinks

Dec 20 '25 18:12 nightguarder

This isn't an issue for this PR - we need to bump mlx versions and test afaik.

Dec 20 '25 19:12 Evanev7

Ok it’s working. GPT-OSS- model loaded. However I had to adedd TEMPORARY overrides as in my commit: 2e446ab Not ideal, we need to wait for official mlx support version.

Dec 21 '25 09:12 nightguarder

GPT-oss-20b has no chat_template.jinja resulting in artifacts and instructions appearing in chat:

QUERY
Hello

EXO
09:25:43
TTFT 555ms•70.7 tok/s
<|channel|>analysis<|message|>We need to be helpful, concise, no reasoning inside answer. Respond "Hello". Maybe ask how to help.<|end|><|start|>assistant<|channel|>final<|message|>Hello! How can I help you today?

Dec 21 '25 09:12 nightguarder

appreciate the enthusiasm but can we keep this pr down to custom models? the gpt-oss fix is a separate issue.

Dec 21 '25 11:12 Evanev7

Hi, I have removed the specific memory overrides for gpt-oss-20b model. Can I now request a review from Developer / Maintainer for this PR: #937 ? Thank you

Dec 23 '25 09:12 nightguarder

Please continue! I'm excited to get this feature in EXO. I appreciate your patience while we work out all the details.

Dec 23 '25 17:12 Evanev7

Integrating custom mlx models

Custom MLX Models Support - Issue #918

Motivation

What Changed

1. Frontend UI for Custom Models

2. Automated Tests

3. Persistent Storage & Model Registration

Why It Works

Known Issues

1. Missing chat_template.jinja for Some Models

Testing

Manual Testing

Automated Testing

Files Modified

1. Missing `chat_template.jinja` for Some Models