exo Run custom MLX models

Is it possible to run custom MLX models from HuggingFace like for example https://huggingface.co/mlx-community/gpt-oss-20b-MXFP4-Q8? If not, I think this would be a feature many would be interested in. Particularly now since Apple released RDMA support.

Dec 18 '25 20:12 arno4000

Probably the single feature we want to ship ASAP - Christmas notwithstanding. It's totally possible right now through model_cards.py, while some models require more custom support for tensor sharding, gpt-oss-20b is (I believe) not one of them.

Dec 18 '25 22:12 Evanev7

Hi, I’ve been sort of implementing this on my own fork of exo. How can I help with this?

Dec 20 '25 11:12 nightguarder

Testing is perhaps the main thing - add a model card to src/exo/shared/model_cards/model_cards.py and see what breaks. As a heads up @rltakashige is also working on this but YMMV

We had some version incompatibility between gpt oss and kimi k2's tokenizers - seems like the release candidate of transformers isnt quite there yet.

Dec 20 '25 11:12 Evanev7

Ok I will try my best to test out the smaller models. I don’t have a powerful hardware to test out the bigger ones you mentioned. #937

Dec 20 '25 12:12 nightguarder

GPT OSS and GLM sharding support is around the corner, as well as a few more types of Qwen models.

There is a transformers version incompatibility with Ministral3 models, which @Evanev7 pointed out.

Dec 20 '25 20:12 rltakashige

hola

Dec 21 '25 23:12 edixongomez98