exo icon indicating copy to clipboard operation
exo copied to clipboard

[BUG] mlx-community/gpt-oss-120b-MXFP4-Q8 stuck in loading/failed loop

Open nackerr opened this issue 2 months ago • 3 comments

Describe the bug

After updating to the latest MacOS app, I can no longer create an instance using this model using Tensor and RDMA.

To Reproduce

Steps to reproduce the behavior:

  1. Select mlx-community/gpt-oss-120b-MXFP4-Q8
  2. Select Tensor and RDMA
  3. Click Launch

Expected behavior

The instance launches.

Actual behavior

It gets stuck in a loop of loading, failed, unknown.

Environment

  • macOS Version: 26.2
  • EXO Version: Latest
  • Hardware:
    • Device 1: M4 Max Mac Studio
    • Device 2: M4 Pro Mac mini
  • Interconnection:
    • TB5 between both Macs.

https://github.com/user-attachments/assets/a0b9fc0e-e0b6-447d-a54a-ffbb89af804f

nackerr avatar Jan 11 '26 02:01 nackerr

I’m running exo on two Mac Studios (macOS Tahoe 26.2, Thunderbolt 5, RDMA enabled).

I noticed that:

Pipeline and MLX Ring modes allow selecting 2 nodes and work as expected.

But when I select Tensor or Tensor + MLX RDMA, the UI only allows 1 node (minimum nodes is locked to 1).

Both machines can run exo individually, models are synced, and RDMA is enabled via rdma_ctl enable.

Is this a current limitation of exo’s Tensor/RDMA implementation, or is there something missing in my setup? Has anyone been able to use Tensor + RDMA with multiple nodes?

aaronysl avatar Jan 12 '26 09:01 aaronysl

Sorry @nackerr @aaronysl, it looks like 1.0.61/1.0.62 introduced some serious regressions. For now I'd recommend using 1.0.60 which you can download here: https://assets.exolabs.net/EXO-1.0.60.dmg

Thank you for reporting the bug - it helps a lot. We are working on a fix in 1.0.63.

AlexCheema avatar Jan 12 '26 23:01 AlexCheema

Heya - we don't have tensor OSS support in .60 but it should be supported once #1144 lands

Evanev7 avatar Jan 13 '26 14:01 Evanev7

Should be fixed in the next build!

Evanev7 avatar Jan 14 '26 16:01 Evanev7