gloo icon indicating copy to clipboard operation
gloo copied to clipboard

How to use fixed port to make connection through TCP connection of gloo

Open leye7755 opened this issue 7 years ago • 10 comments

I found that every time TCP connection with random port in Gloo ,But this condition is not suitable for mine . In my work, I need to open specific port for them to work. And I get the help from @pietern that tell me having Gloo pick from a predefined set of ports. But this is still some confusion. Is there any way help me to solve it or some detail about it . by the way , I use it to distribute train for caffe2. Many thanks !

@pietern @zpao @yfeldblum @achao @gfosco

leye7755 avatar Aug 01 '17 13:08 leye7755

Hey, thanks for opening the issue.

I mentioned in our conversation earlier that we would have to make Gloo pick use predefined ports such that you can whitelist them in your environment. This is not a current feature. As I mentioned, the only way to make this work today is to remove any firewalling between the machines you intend to use for distributed training. A possible solution would use consecutive ports for all of its peers (e.g. 5000, 5001, etc, one for every peer). Then you would only have to whitelist a number of ports equal to the number of machines you intend to use. But this is not available today.

pietern avatar Aug 01 '17 20:08 pietern

Thank you for you help. And Did you mean that I should modify the code of gloo. Can you tell me where I should modify the code . It would very useful if you can provide process or architecture of gloo . Thank you @pietern

leye7755 avatar Aug 02 '17 15:08 leye7755

You could start in Pair::listen, that's where the bind(2) function is called (link). You could choose to use for example 8 ports round robin if your context is not larger than 8 machines.

pietern avatar Aug 10 '17 23:08 pietern

I mentioned in our conversation earlier that we would have to make Gloo pick use predefined ports such that you can whitelist them in your environment.

Hi @pietern, what is the range of ports Gloo uses for the tcp transport layer?

erikwijmans avatar May 04 '19 00:05 erikwijmans

@erikwijmans Currently it still lets the OS pick a port to bind to. It is technically possible to force a range of ports on the listening side, as long as the number of ports in the range is equal to the number of participants in the context. This is not implemented today though.

For my curiosity: are you trying to use Gloo in a firewalled environment?

pietern avatar May 06 '19 16:05 pietern

Yes, the firewalls on our cluster is fairly restrictive, but we’d like to be able to open up enough ports to use gloo. Any suggestions? Or is there a range that our OS (Ubuntu 16.04) will tend to use and we can open that range?

erikwijmans avatar May 06 '19 18:05 erikwijmans

It lets the operating system decide which port to use. AFAIK this means it picks an unused port from the ephemeral port range. You can get/set this range with sysctl or by editing procfs values directly. By default the range is rather large:

$ sysctl net.ipv4.ip_local_port_range 
net.ipv4.ip_local_port_range = 32768    65534

pietern avatar May 09 '19 17:05 pietern

Also stumbled on this: https://github.com/pytorch/pytorch/issues/44544

vadimkantorov avatar Sep 17 '20 19:09 vadimkantorov

The comment on this made me realized I never followed up! Using systctl to set net.ipv4.ip_local_port_range to some small(-ish) port range (we did a range of 3,000 and that seems to be more than adequate for our cluster size) and then opening that range in the firewall worked perfectly.

erikwijmans avatar Sep 18 '20 23:09 erikwijmans

is this feature implemented in gloo? can we specify the tcp ports now?

Tiiiger avatar Apr 19 '21 18:04 Tiiiger