exo icon indicating copy to clipboard operation
exo copied to clipboard

Request: primary node or mix of manual/udp

Open mr-deamon opened this issue 1 year ago • 1 comments

Hi Great project, i really like how fast things are moving here!

We have the following scenario: Multiple servers with GPUs in the same network, but they are "unstable". Some are used for other tasks while others are idling, i changes from time to time.

Creating a VM (or a physical machine, if necessary) outside this network is easy, but it won't have GPUs. Also, they can't directly reach each other.

So the primary/secondary architecture other projects use is quite usefull: primary runns all the time, GPUs join and leave. It would be great to have the possibility to specify a primary- or non-gpu-node which stays in charge (or stays online, for the matter) and have the other hosts find each other automatically as they already do with UDP.

Do you think something like this could be feasable?

Chris

mr-deamon avatar Nov 26 '24 15:11 mr-deamon

Hi Great project, i really like how fast things are moving here!

We have the following scenario: Multiple servers with GPUs in the same network, but they are "unstable". Some are used for other tasks while others are idling, i changes from time to time.

Creating a VM (or a physical machine, if necessary) outside this network is easy, but it won't have GPUs. Also, they can't directly reach each other.

So the primary/secondary architecture other projects use is quite usefull: primary runns all the time, GPUs join and leave. It would be great to have the possibility to specify a primary- or non-gpu-node which stays in charge (or stays online, for the matter) and have the other hosts find each other automatically as they already do with UDP.

Do you think something like this could be feasable?

Chris

If I understood your use case correctly, this should already work out of the box.

There's no "master" node in exo. You would just use the endpoint from the "primary" node you described. Any other nodes that join/leave will automatically have an effect on the topology.

AlexCheema avatar Nov 26 '24 20:11 AlexCheema