exo macos cluster only load memory in first machine

When running exo on eight Mac minis (IP addresses 10.25.0.1–10.25.0.8), it was observed that only the first machine (10.25.0.1) loaded memory, while the other machines showed no change. During the configuration process, the environment was set up using the install.sh script, and a symbolic link was created in the shared storage for the downloaded weights path with the command: ln -s /Volumes/long990max/exo_data ~/.cache/exo Additionally, the path /Volumes/long990max/exo_data is shared across the entire Mac mini cluster via the Thunderbolt network bridge using the Samba protocol.

Feb 25 '25 06:02 hotwa

hi, do please correct me if I got it wrong. Have you already tested the existing SMB connection with your mac cluster? or select one small model to test dual-mac cluster rather than all 8 macs as you mentioned?

Feb 25 '25 07:02 xuanzhec

I use 8 macs

Feb 25 '25 07:02 hotwa

it show like this

Feb 25 '25 07:02 hotwa

mac mini(10.25.0.2-7) can not use internet, so show:

"~/project
/exo/.venv/lib/python3.12/
site-packages/aiohttp/conn
ector.py", line 1341, in 
_create_direct_connection
    raise 
ClientConnectorDNSError(re
q.connection_key, exc) 
from exc
aiohttp.client_exceptions.
ClientConnectorDNSError: 
Cannot connect to host 
huggingface.co:443 
ssl:default [nodename nor 
servname provided, or not 
known]
Download error on attempt 
12/30 for 
repo_id='mlx-community/Dee
pSeek-R1-4bit' 
revision='main' 
path='model.safetensors.in
dex.json' 
target_dir=PosixPath('/var
/folders/r_/tyjhn3z554dbdz
sllqj69kyh0000gn/T/exo/mlx
-community--DeepSeek-R1-4b
it')

Feb 25 '25 07:02 hotwa

Indeed, I do believe your mac cluster has enough capability to run the q4 mlx version (BTW, I just watched the real instance that one mac with M4 Max (128GB RAM) can run DeepSeek R1 Dynamic 1.58-bit with Llama.cpp.). Just connection issue between your first mac with other 7, and I am not sure whether your configuration of thunderbolt 5 using the Samba protocol is built correctly. Maybe you just test two of 8 your macs to make thunderbolt 5 bridge to run 4 bit deepseek 32B or directly test them via wireless network (which you have to install exo and deepseek respectively for each mac) to verify the connection.

Feb 25 '25 07:02 xuanzhec

libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
zsh: abort exo --node-id=$NODE_ID --node-host=$CURRENT_HOST --discovery-module=udp

After enabling the networking function (Wi-Fi) of the remaining Mac Mini and using shared storage, I found that some machines had successfully loaded the weights into memory. Unfortunately, nodes 4 and 6 encountered the above error.

Feb 25 '25 08:02 hotwa

Indeed, I do believe your mac cluster has enough capability to run the q4 mlx version (BTW, I just watched the real instance that one mac with M4 Max (128GB RAM) can run DeepSeek R1 Dynamic 1.58-bit with Llama.cpp.). Just connection issue between your first mac with other 7, and I am not sure whether your configuration of thunderbolt 5 using the Samba protocol is built correctly. Maybe you just test two of 8 your macs to make thunderbolt 5 bridge to run 4 bit deepseek 32B or directly test them via wireless network (which you have to install exo and deepseek respectively for each mac) to verify the connection.

I have successfully completed the task of running a distributed inference model on exo using the Thunderbolt 5 bridge, although it wasn't the 4-bit DeepSeek 32B model. I can't quite remember which model it was. However, I do have some doubts about my Thunderbolt cables. The cables I purchased are from three different brands, and I'm not sure if that might cause any issues, even though they are all Thunderbolt 5 cables.

Feb 25 '25 08:02 hotwa

Sounds good! 老哥你现在速度多少啊？这样8台跑4位mlx版的r1是不是有点浪费，我感觉你跑原生671B都可以。

Feb 25 '25 08:02 xuanzhec

Sounds good! 老哥你现在速度多少啊？这样8台跑4位mlx版的r1是不是有点浪费，我感觉你跑原生671B都可以。

fp8 不支持的mlx没有加速

Feb 25 '25 08:02 hotwa

原生内存不够的，考虑到对话过程中有kvcache

Feb 25 '25 08:02 hotwa

libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout) zsh: abort exo --node-id=$NODE_ID --node-host=$CURRENT_HOST --discovery-module=udp —————— I do not kown why exit with this error

Feb 25 '25 08:02 hotwa

libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout) zsh: abort exo --node-id=$NODE_ID --node-host=$CURRENT_HOST --discovery-module=udp —————— I do not kown why exit with this error

mlx库的版本不一致？

Feb 25 '25 08:02 xuanzhec

都是最新的代码，mlx版本都应该是一致的。反正每次运行都有两台机器出现这个报错，然后退出。似乎在运行时候会哈希校验，必须从huggingface下载，从其他镜像下载就运行不了，报md5校验不一样

Feb 25 '25 08:02 hotwa

用mlx.launch调用倒是没有这个报错，但是又其他报错。mpirun也是正常的。都能正常加载到内存中去，共享存储也没有任何问题。就是推理有其他报错

Feb 25 '25 08:02 hotwa

mlx框架bug有很多

Feb 25 '25 08:02 hotwa

mlx.launch
--hostfile /Volumes/long990max/hosts.json
--backend mpi
--mpi-arg "--mca btl tcp,self
--mca btl_tcp_if_include 10.25.0.0/24
--mca oob_tcp_if_include 10.25.0.0/24
--mca oob_tcp_disable_family ipv6
--mca btl_tcp_links 2
--mca plm_base_verbose 100
--mca btl_base_verbose 100"
/Volumes/long990max/pipeline_generate.py
--prompt "What number is larger 6.9 or 6.11?"
--max-tokens 64
--model /Volumes/long990max/exo_data/downloads/mlx-community--DeepSeek-R1-3bit
--verbose —————— 这样启动没啥问题，但是巨慢，参考mlx框架运行的，mpirun有很多奇怪的解析问题。

Feb 25 '25 08:02 hotwa

都是最新的代码，mlx版本都应该是一致的。反正每次运行都有两台机器出现这个报错，然后退出。似乎在运行时候会哈希校验，必须从huggingface下载，从其他镜像下载就运行不了，报md5校验不一样

话说你模型是从hugging face下的然后放在exo的目录下面，对吗？

Feb 25 '25 08:02 xuanzhec

最好让脚本自己下，别用镜像源

Feb 25 '25 08:02 hotwa

mlx.launch --hostfile /Volumes/long990max/hosts.json --backend mpi --mpi-arg "--mca btl tcp,self --mca btl_tcp_if_include 10.25.0.0/24 --mca oob_tcp_if_include 10.25.0.0/24 --mca oob_tcp_disable_family ipv6 --mca btl_tcp_links 2 --mca plm_base_verbose 100 --mca btl_base_verbose 100" /Volumes/long990max/pipeline_generate.py --prompt "What number is larger 6.9 or 6.11?" --max-tokens 64 --model /Volumes/long990max/exo_data/downloads/mlx-community--DeepSeek-R1-3bit --verbose —————— 这样启动没啥问题，但是巨慢，参考mlx框架运行的，mpirun有很多奇怪的解析问题。

这样可以顺利将权重加载到内存中去

Feb 25 '25 08:02 hotwa

要注意，关闭ipv6，只用雷电网桥

Feb 25 '25 08:02 hotwa

就是exo好像不定期会加点cache文件到模型下载目录里，所以模型大小总跟镜像里不一样，大小不一样就开始间接性error file removing。

Feb 25 '25 08:02 xuanzhec

就是exo好像不定期会加点cache文件到模型下载目录里，所以模型大小总跟镜像里不一样，大小不一样就开始间接性error file removing。

exo是分块下载到每台机器的

Feb 25 '25 11:02 hotwa