clustermq icon indicating copy to clipboard operation
clustermq copied to clipboard

Specify ports used by zeromq

Open dankessler opened this issue 1 year ago • 0 comments

I'm trying to use clustermq on a slurm-based HPC system. It's configured in such a way that worker nodes can very freely communicate with one another, but the login node's firewall is a bit more locked down. As a result, if I try to launch Q from the login node, then things fall down as described in the FAQ here.

If I instead start a job (using sbatch) and call Q from there, then things work fine, because now my head-node happens to be a worker node, and so it can be spoken to freely by other worker nodes.

This is a decent workaround, but I'd really like to use the SSH-based workflow, wherein I call Q on my personal computer, it SSHes to the login node, and the login node handles things from there, but this won't work because the workers can't communicate results back to the head node when the head node happens to be the login node.

I've spoken with the HPC sysadmins, and they naturally wanted to know which ports I needed open, and understandably they'd like this to be a narrow range.

I went digging in the clustermq source code, and it looks like the port gets chosen here based on the value of addr passed in by Pool, which in turn relies on a random sample from the output of host, which by default randomly chooses 100 ports in the range 6000:9999. To confirm my logic, I forked clustermq, and changed the port range specified in the definition of host and confirmed that if I install from my fork, clustermq now uses these ports.

Forking and editing the util.R file is a clumsy way to obtain control over which ports are used, and I'm curious if there's another way to control this that is exposed to the user already. If not, perhaps this could be exposed as an additional option? It would also be nice to use this to specify what the default is.

I know that the SSH forwarding port can be specified in the options, but as I understand that's separate from telling zeromq which ports to consider when starting up a new process.

If options feels like the best way to handle this, I'd be happy to try to tackle it in a PR, which would essentially involve the following changes

  1. Document a new option like clustermq.zmq.ports which should be a vector of integers that are considered as ports and that defaults to 6000:9999
  2. Tweak host to respect this option when available and fall back ontothe default otherwise

dankessler avatar Mar 09 '24 02:03 dankessler