exo icon indicating copy to clipboard operation
exo copied to clipboard

[Bug] Model hangs during warmup on macOS if MPICH is installed instead of Open MPI

Open rafipatel opened this issue 2 months ago • 2 comments

Description

On macOS (Sequoia 15.2, M3), models (all I believe but tried with Llama 3.2 1B [4 bit], 3.1 8b [8bit] ,Qwen 3 0.6B (4bit) get stuck in the "WARMING UP" state and never transition to "READY". This prevents any chat completions from executing.


Root Cause

The warmup_inference function calls mx_barrier() via MLX. If the system has MPICH installed (e.g., via Homebrew) instead of Open MPI, the MPI barrier hangs indefinitely.

This appears to be due to MLX's specific dependency on Open MPI for its distributed/synchronization features.


Steps to Reproduce

  1. Install MPICH on macOS:

    brew install mpich
    
  2. Run:

    uv run exo
    
  3. Download and instance llama-3.2-1b

  4. Observe the logs stick at Generated ALL warmup tokens but never reach runner ready


Expected Behavior

The runner completes warmup and becomes ready for inference.


Actual Behavior

The process hangs at mx_barrier() inside warmup_inference.


Environment

  • Hardware: MacBook Air M3

  • OS: macOS 15.2

  • RAM/Storage: 16/512

  • Python: 3.13

  • MPI:

    • MPICH 4.3.0 (Hanging)
    • Open MPI 5.0.9 (Working)

Fix (Working fix)

Uninstalling MPICH and installing Open MPI resolved the issue immediately.

brew uninstall mpich
brew install open-mpi

rafipatel avatar Dec 30 '25 17:12 rafipatel

Good catch, but definitely an upstream issue for us - have you reported this to MLX as well?

Evanev7 avatar Dec 31 '25 00:12 Evanev7

Good catch, but definitely an upstream issue for us - have you reported this to MLX as well?

Nope, not yet, will do.

Below is the log ss.

Image

rafipatel avatar Dec 31 '25 07:12 rafipatel