petals Running on SLURM

Having to specifically hard code IP adresses makes it very hard to run petals on a SLURM cluster. There I submit batch jobs that are then run on some node of the partition I specified. I do not know the IP beforehand of the node or any nodes that I run a petals server instance on.

So one thing that would be helpful is a "self discovery" of petals server instances inside a specified network.

Dec 24 '22 19:12 ghost

#include sorry_for_slow_response.h

Hi!

Can you please explain how you specify node's address? Unless there's some special networking wizardry on that cluster, you should be able to specify 0.0.0.0 instead of your ip address in --host_maddrs and it should work normally.

While we figure this out, here's a quick workaround that should work on most machines:

export IPV4=$(dig -4 TXT +short o-o.myaddr.l.google.com @ns1.google.com |  tr -d '"')
# or: export IPV6=$(dig -6 TXT +short o-o.myaddr.l.google.com @ns1.google.com |  tr -d '"')

# if you do not have an ipv4 / v6 address, it will be 

# test
echo "run_stuff --host_maddrs /ip4/$IPV4/tcp/1337"

@Vahe1994 is also working on an automatic relaying script to make this even easier to set up, will keep you updated in this issue.

Dec 26 '22 19:12 justheuristic

Hello, thanks for the reply! I was wondering, what would be the advantage of running a private petals network instead of a torch distributed or huggingface accelerate run? Sorry if the question seems very basic to you.

Dec 29 '22 08:12 ghost

Hi! If you have a swarm where all nodes have the same GPU / network specs and are 100% reliable - you should prefer torch.distributed -- or even deepspeed.inference.

If your GPUs are preemptible - e.g. sometimes other people wanna use it and you need to shut down some of the nodes, Petals can handle that, while torch.distributed would require a lot of extra effort.

Dec 30 '22 15:12 justheuristic

One small addition to @justheuristic's response: as far as I know, neither torch.distributed nor DS-Inference provide you with a full-fledged setup for running a model inference server, only the building blocks for parallelism and various inference optimizations. That's fine if you want to implement the actual server yourself, but if you need a complete solution for exposing models to external requests, you'd be better off with something like Triton (or Petals!)

Jan 03 '23 09:01 mryab

petals petals copied to clipboard

Running on SLURM

petals
petals copied to clipboard