exo icon indicating copy to clipboard operation
exo copied to clipboard

Integrating custom mlx models

Open nightguarder opened this issue 6 days ago • 14 comments

Custom MLX Models Support - Issue #918

Motivation

Fixes Issue #918: Enable users to run custom MLX models from mlx-community on Hugging Face without manual code updates.

What Changed

1. Frontend UI for Custom Models

Commit: "Add custom models to dashboard"

  • Added "Custom Models" section to downloads page with HuggingFace model ID input
  • Implemented "Download and Run" button triggering model placement into the Pipeline
  • Extended store with placeInstance() method for backend API communication

2. Automated Tests

Commit: "Add integration test"

  • Created src/exo/worker/tests/test_custom_model.py - integration test verifying:
    • Custom model placement is triggered correctly
    • Model downloads and loads successfully
    • Chat inference works with loaded model
  • Added .github/workflows/test_custom_models.yml - GitHub Action for automated CI testing

3. Persistent Storage & Model Registration

Commit: "Persist storage for custom models"

  • Fixed resolve_model_meta() to check both short_id keys and full model_id values
  • Enabled custom model registration to ~/.exo/custom_models.json during download
  • Models reload automatically on EXO restart from persistent storage

Why It Works

This implementation enables dynamic custom model loading without requiring code modifications for each new model. Users can:

  • Download any mlx-community model via the dashboard
  • Have models persist across restarts
  • Test out the model once it loads

Known Issues

1. Missing chat_template.jinja for Some Models

Some mlx-community models don't include a chat template, causing the model to output its Instructions instead of formatted chat responses. This is a model-specific issue with mlx-community models, not a bug in our implementation.

Workaround: Use models that include proper chat templates (e.g., `mlx-community/Qwen2.5-0.5B-Instruct-4bit or add a chat template.jinja yourself.

Testing

Manual Testing

  • Hardware: MacBook Pro (M4 Pro)
  • Tested with mlx-community/gpt-oss-20b-MXFP4-Q8
  • Verified:
    • Model appears in downloads list with correct size
    • Download progress bar updates in real-time
    • Model persists in ~/.exo/custom_models.json
    • Model is available after restart
    • Chat inference works correctly

Automated Testing

  • Integration test: src/exo/worker/tests/test_custom_model.py
  • CI workflow: .github/workflows/test_custom_models.yml
  • Input validation: Only allows mlx-community models in downloads

Files Modified

  • src/exo/master/api.py - Model resolution & API response
  • src/exo/shared/models/model_cards.py - Persistence logic
  • src/exo/worker/download/impl_shard_downloader.py - Registration on download
  • dashboard/src/routes/downloads/+page.svelte - Custom models UI
  • dashboard/src/lib/stores/app.svelte.ts - API integration

nightguarder avatar Dec 20 '25 13:12 nightguarder

Hi, I have successfully added a new Feature: Testing custom MLX models

Can Someone please clone & run my fork to verify downloading a larger model like mlx-community/gpt-oss-20b-MXFP4-Q8? I don’t have enough RAM :/

nightguarder avatar Dec 20 '25 15:12 nightguarder

I hope this is something we wanted. Currently only for testing purposes. Screenshot 2025-12-20 at 16 02 34

nightguarder avatar Dec 20 '25 15:12 nightguarder

Not sure why my VSCode Prettier auto prettified all the files I’ve changed.

I will probably create a new clean PR where I only change the required code blocks, to keep it clean, if it’s needed to Approve this feature request.

nightguarder avatar Dec 20 '25 15:12 nightguarder

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

Evanev7 avatar Dec 20 '25 15:12 Evanev7

As for prettier, I don't believe our current formatter extends to the dashboard so I don't particularly mind atm

Evanev7 avatar Dec 20 '25 15:12 Evanev7

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

My idea was after the users tests it and verify, then we add it model_cards as official supported model. but yeah, can be skipped.

nightguarder avatar Dec 20 '25 15:12 nightguarder

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

Evanev7 avatar Dec 20 '25 16:12 Evanev7

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

Yes I see the erorr. this might be more difficult than I thought. Runner 4e13d976-5262-43eb-b513-e9678e673e59 crashed with critical exception Quantized SDPA does not support attention sinks

nightguarder avatar Dec 20 '25 18:12 nightguarder

This isn't an issue for this PR - we need to bump mlx versions and test afaik.

Evanev7 avatar Dec 20 '25 19:12 Evanev7

Ok it’s working. GPT-OSS- model loaded. However I had to adedd TEMPORARY overrides as in my commit: 2e446ab Not ideal, we need to wait for official mlx support version.

nightguarder avatar Dec 21 '25 09:12 nightguarder

GPT-oss-20b has no chat_template.jinja resulting in artifacts and instructions appearing in chat:

QUERY
Hello

EXO
09:25:43
TTFT 555ms•70.7 tok/s
<|channel|>analysis<|message|>We need to be helpful, concise, no reasoning inside answer. Respond "Hello". Maybe ask how to help.<|end|><|start|>assistant<|channel|>final<|message|>Hello! How can I help you today? 

nightguarder avatar Dec 21 '25 09:12 nightguarder

appreciate the enthusiasm but can we keep this pr down to custom models? the gpt-oss fix is a separate issue.

Evanev7 avatar Dec 21 '25 11:12 Evanev7

Hi, I have removed the specific memory overrides for gpt-oss-20b model. Can I now request a review from Developer / Maintainer for this PR: #937 ? Thank you

nightguarder avatar Dec 23 '25 09:12 nightguarder

Please continue! I'm excited to get this feature in EXO. I appreciate your patience while we work out all the details.

Evanev7 avatar Dec 23 '25 17:12 Evanev7