exo icon indicating copy to clipboard operation
exo copied to clipboard

[BUG] Qwen3 loads indefinitely with MLX RDMA selected

Open RWL-Dittrich opened this issue 2 months ago • 7 comments

Describe the bug

A clear and concise description of what the bug is.

When I try to run Qwen3 30B (8bit or 4bit, doesn't matter) with MLX RDMA selected the model keeps loading indefinitely. Pipeline sharding with RDMA seems to hang on "warming up" while using 100% of the GPU. No errors seem to appear in ~/.exo/exo.log.

Other models like Llama 70B do seem to work correctly with RDMA (with the expected almost ~1.8x speedup)

Tensor + MLX RDMA Image

Pipeline + MLX RDMA Image

To Reproduce

Steps to reproduce the behavior:

  1. I have two 64 gig M4 Pro Mac Mini's
  2. Install the EXO latest DMG
  3. try to run any of the Qwen models with MLX RDMA selected.

Expected behavior

I expect the model to load and be available as intended

Actual behavior

Loading the model seems to hang.

Environment

  • macOS Version: 26.3 Public preview
  • EXO Version: 1.0.59
  • Hardware:
    • Device 1: Mac Mini M4 Pro, 64GB RAM
    • Device 2: Mac Mini M4 Pro, 64GB RAM
  • Interconnection:
    • Thunderbolt 4 cable between Device 1 and 2
    • 1GbE LAN between the two devices

Additional context

Add any other context about the problem here.

RWL-Dittrich avatar Dec 24 '25 08:12 RWL-Dittrich

These are possibly the hardest issues to debug. One thing I would try is to quit Exo, run sudo purge and retry. If that doesn't work, restart the machines and run Qwen first.

If none of these fixes your issue, I'll try to replicate the issue on our own Mac Mini setup.

I'm surprised a Thunderbolt 4 cable works, but that's quite promising for Thunderbolt 4 ports getting RDMA support in the future...

rltakashige avatar Dec 24 '25 16:12 rltakashige

Would definitely suggest trying with a TB5 cable.

AlexCheema avatar Dec 24 '25 17:12 AlexCheema

I'm seeing the same behavior on our cluster; 4 M3 Ultra Mac Studios running on macOS 26.2 connected via TB5 with RDMA enabled. The Qwen3 models just stay stuck at Loading.

kshaffer-bf avatar Dec 24 '25 18:12 kshaffer-bf

I'm seeing the same behavior on our cluster; 4 M3 Ultra Mac Studios running on macOS 26.2 connected via TB5 with RDMA enabled. The Qwen3 models just stay stuck at Loading.

Is it only for Qwen3? Can you run other models with RDMA?

AlexCheema avatar Dec 24 '25 19:12 AlexCheema

@RWL-Dittrich @kshaffer-bf I just pushed a hotfix that attempts to fix this. Can you go into your EXO macOS app and click "Check For Updates" and update to 1.0.60 and restart on each machine. Let me know if that fixes the issue for you.

AlexCheema avatar Dec 24 '25 21:12 AlexCheema

Hey there @AlexCheema,

Yes, this is only with Qwen3 (specifically, I am attempting to load Qwen3-235B-A22B 4-Bit). I have updated my cluster to 1.0.60 and I am still seeing this behavior. Stuck at Loading, never gets to "Warming Up".

I am able to load Llama 3.3 70B FP16, for reference.

kshaffer-bf avatar Dec 26 '25 16:12 kshaffer-bf

Hi @AlexCheema,

I hope you had good holidays! I just got back to work and saw the update on this issue so I tried a few things to see if it works now.

I just updated to the latest EXO release through the popup that appeared.

When I tried running the Qwen 30B - 8bit model it still didn't seem to work unfortunately. I'm downloading the newly-added 80b model as we speak. But that will take a little while to finish. I'll report on that when I know more.

I also tried running the latest commit version on main but that didn't seem to help either. I get no logs when the model crashes out. It just stays in its frozen "loading" state.

RWL-Dittrich avatar Jan 02 '26 07:01 RWL-Dittrich