petals
petals copied to clipboard
Running on SLURM
Having to specifically hard code IP adresses makes it very hard to run petals on a SLURM cluster. There I submit batch jobs that are then run on some node of the partition I specified. I do not know the IP beforehand of the node or any nodes that I run a petals server instance on.
So one thing that would be helpful is a "self discovery" of petals server instances inside a specified network.