exo icon indicating copy to clipboard operation
exo copied to clipboard

[BUG] mlx-community/Qwen3-30B-A3B-4bit with Tensor/RDMA gets stuck in LOADING

Open AlexCheema opened this issue 2 months ago • 6 comments

Describe the bug

Launching an instance of mlx-community/Qwen3-30B-A3B-4bit on 2 nodes with Tensor/RDMA gets stuck in LOADING.

To Reproduce

Steps to reproduce the behavior:

  1. Start exo on 2 nodes
  2. Launch instance of mlx-community/Qwen3-30B-A3B-4bit with Tensor/RDMA on 2 nodes
  3. Instance gets stuck in LOADING

Expected behavior

Instance should load and warm up as normal.

Actual behavior

Instance gets stuck in LOADING.

Environment

  • macOS Version: 26.3
  • EXO Version: main: 59e7594e3412a3164caa7de5d92416ec542fd67a
  • Hardware:
    • 2 x 512GB M3 Ultra Mac Studio
  • Interconnection:
    • TB5 and Ethernet switch (both all-to-all)

Additional context

There's a related issue with Pipeline/RDMA which generates only the warmup token then gets stuck in WARMUP with this model. I'll create a separate issue for that.

AlexCheema avatar Jan 09 '26 12:01 AlexCheema

me too

adamli008 avatar Jan 13 '26 03:01 adamli008

me too. must reboot

aaronysl avatar Jan 15 '26 02:01 aaronysl

Same issue with GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit. Env: 2x512 m3u w/ TB5, RDMA enabled. macOS 26.2. Exo v1.0.63. Doesn't matter whether running from the app or the code. Always reproducible

imbible avatar Jan 18 '26 02:01 imbible

Same issue with GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit. Env: 2x512 m3u w/ TB5, RDMA enabled. macOS 26.2. Exo v1.0.63. Doesn't matter whether running from the app or the code. Always reproducible

Looking into it.

This definitely worked previously with DeepSeek-V3.1-8bit so there's been some regression.

Do any models work for you e.g. Llama or Kimi?

AlexCheema avatar Jan 18 '26 02:01 AlexCheema

I'll have to free up some disk space to download them tomorrow. My hard drive is almost full.

imbible avatar Jan 18 '26 03:01 imbible

I've been exploring ways to prevent this from happening. The latest release should shut down the instance if it fails to load.

I have an experimental branch below which may be a fix. It at least works for GPT OSS 20B. If you're running from terminal, please try it out! https://github.com/exo-explore/exo/pull/1195

rltakashige avatar Jan 18 '26 17:01 rltakashige

Same problem! I ran exo in the terminal, and the problem was solved.

Hydrogenion avatar Jan 19 '26 10:01 Hydrogenion

After getting the latest code today, GLM-4.7-8bit-gs32 and DeepSeek-V3.1-8bit worked.

imbible avatar Jan 20 '26 00:01 imbible

nice!

Evanev7 avatar Jan 20 '26 10:01 Evanev7