exo [BOUNTY - $300] Support Multi-GPU

Currently you can only run one exo instance on each device.

There are some design decisions here:

Should we support running multiple exo instances on the same device, with one per GPU
Or should we support running one exo instance that uses multiple GPUs

Sep 19 '24 19:09 AlexCheema

I agree with supporting one exo instance that uses multiple GPUs. This approach would allow us to shard more when only one model is inferencing. What do you think?

Oct 15 '24 03:10 Sean-fn

Chiming in as a new user with a multi-GPU setup. One instance is easiest. Users can simply control GPU selection with the CUDA_VISIBLE_DEVICES environment variable.

Oct 26 '24 03:10 cmcmaster1

i was able to do this, I forked to github and added configurations to the integration.py and integration_engine.py

Nov 10 '24 13:11 jorge123255

the only issue is trying to have it show up on the exo console page to show 2 gpus instead of one, still testing.

Nov 10 '24 13:11 jorge123255

Multiple instances per device, assign a gpu to each. (approach 1)

pros:

levarage all orchestration and model splitting functionality, ideally you figure out how to parallelize layers (and only once)
aesthetics: uses the primitive exo functionality on a different scale
this approach seems to scale with different kinds of topologies (not even known ones)

cons:

node communication overhead?

single instance per deivce, assign multiple gpus (approach 2):

aspects:

nodes have to broadcast the sum of their multi-gpu setup RAM, and nodes have to internally handle mutli-gpu.

pros:

don't have to deal with overlapping system resources (ports, file locks, etc.)
inference engines already support multi gpu? (but exo does, too - across devices)

cons:

2 ways of doing multi gpu
composability?

Case 1: Multigpu (approach 2) is very easy to do. In that case one might go with approach 2, for now and keep approach 1 in mind for later.

Case 2: Multigpu is not easy to do, approach 1 and 2 are roughly the same effort. In that case, I would go with approach 1.

Nov 25 '24 14:11 benjamin-asdf

I implemented a temporary workaround using approach 2 in #656.

Jan 30 '25 04:01 freerainboxbox

I implemented a temporary workaround using approach 2 in #656.

I suppose this isn't a full solution for multi-gpu, it's just a wrapper on VISIBLE_DEVICES. This will be supported fully in the rearchitect I'm working on.

Jan 30 '25 20:01 AlexCheema

can this be done for ComfyUI?

Feb 14 '25 21:02 dmann73699

@AlexCheema I am tinkering with a solution for this, is your rearchitect in a branch or somewhere I could view for reference? I don't want to duplicate work but I have 2 GPUs and I would love to use them :). Otherwise I will probably have a working solution fairly soon

Feb 20 '25 17:02 andyraddatz

This would be great!

Mar 05 '25 08:03 ForbiddenEra

What the heck?

186 :: exo/topology/device_capabilities.py

    handle = pynvml.nvmlDeviceGetHandleByIndex(0)

hard coded to use device 0?!

I couldn't even do a export CUDA_ALLOWED_DEVICES=1 trying to switch GPUs.

I suppose you could have multiple exo environments and manually edit the device ID on each?

But like, just looking at that alone, it seems like no one ever thought about multi-gpu when writing this so far?

Mar 05 '25 09:03 ForbiddenEra

@ForbiddenEra I think it's export CUDA_VISIBLE_DEVICES=1 which I think works to select one or the other, but I wasn't able to get 2 instances running on the same machine to talk to each other correctly.. Maybe something like 2 different LXC or docker containers on same host could work but I haven't tried that.

I made quite a bit of progress here but eventually realized that my multi-GPUs (2 x 4060Ti) don't work with tinygrad even on their multi-GPU mnist example. They don't seem to support P2P with each other (I do not have them in SLI mode and I don't think I can on my motherboard) and tinygrad seems to need that even though it can properly identify 2 different devices.

My approach was to upgrade exo/orchestration/node.py to make self.device_capabilities a List[DeviceCapabilities] and continue to refactor out from there. It was going well for a while until it didn't lol 🤷🏼

Mar 05 '25 19:03 andyraddatz

I also am not able to get two instances on the same machine to talk with each other using CUDA_VISIBLE_DEVICES.

Which is a shame, as I have several GPU on this system and my intention was to leverage two other computers over thunderbolt for additional resources.

Mar 25 '25 01:03 frenzybiscuit

I also am not able to get two instances on the same machine to talk with each other using CUDA_VISIBLE_DEVICES.

Which is a shame, as I have several GPU on this system and my intention was to leverage two other computers over thunderbolt for additional resources.

See #656, this should work.

Mar 25 '25 04:03 freerainboxbox

Forgive my ignorance if this has been discussed elsewhere, but while I'm able to get multiple GPUs on a single machine working together, I have been unable to get both exposed to a sister machine, also with multiple instances of exo exposing multiple GPUs. I end up with one machine showing both of its own GPUs, and then node 1 from the other machine. Each machine in the cluster shows both of the local GPUs and only 1 from the other. I did notice that when I've only got 3 instances running, both sides show the correct configuration, including both remote GPUs on the machine that has them running.

Is this a user-error fixable problem, maybe I need to adjust the node-port on node 2 in addition to the listen-port? Or a functional limitation of how we're implementing multiple GPUs using multiple exo instances?

Apr 16 '25 17:04 groverharris