[BUG] mlx-community/Qwen3-30B-A3B-4bit with Tensor/RDMA gets stuck in LOADING
Describe the bug
Launching an instance of mlx-community/Qwen3-30B-A3B-4bit on 2 nodes with Tensor/RDMA gets stuck in LOADING.
To Reproduce
Steps to reproduce the behavior:
- Start exo on 2 nodes
- Launch instance of
mlx-community/Qwen3-30B-A3B-4bitwith Tensor/RDMA on 2 nodes - Instance gets stuck in LOADING
Expected behavior
Instance should load and warm up as normal.
Actual behavior
Instance gets stuck in LOADING.
Environment
- macOS Version: 26.3
- EXO Version:
main:59e7594e3412a3164caa7de5d92416ec542fd67a - Hardware:
- 2 x 512GB M3 Ultra Mac Studio
- Interconnection:
- TB5 and Ethernet switch (both all-to-all)
Additional context
There's a related issue with Pipeline/RDMA which generates only the warmup token then gets stuck in WARMUP with this model. I'll create a separate issue for that.
me too
me too. must reboot
Same issue with GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit. Env: 2x512 m3u w/ TB5, RDMA enabled. macOS 26.2. Exo v1.0.63. Doesn't matter whether running from the app or the code. Always reproducible
Same issue with GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit. Env: 2x512 m3u w/ TB5, RDMA enabled. macOS 26.2. Exo v1.0.63. Doesn't matter whether running from the app or the code. Always reproducible
Looking into it.
This definitely worked previously with DeepSeek-V3.1-8bit so there's been some regression.
Do any models work for you e.g. Llama or Kimi?
I'll have to free up some disk space to download them tomorrow. My hard drive is almost full.
I've been exploring ways to prevent this from happening. The latest release should shut down the instance if it fails to load.
I have an experimental branch below which may be a fix. It at least works for GPT OSS 20B. If you're running from terminal, please try it out! https://github.com/exo-explore/exo/pull/1195
Same problem! I ran exo in the terminal, and the problem was solved.
After getting the latest code today, GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit worked.
nice!