disco icon indicating copy to clipboard operation
disco copied to clipboard

Communicating with Devices Behind NAT

Open akmohtashami opened this issue 4 years ago • 5 comments

We have to look into how to allow devices behind NAT to send and receive p2p messages.

Options:

  • LibP2P Built-in Mechanisms
  • https://en.wikipedia.org/wiki/TCP_hole_punching

akmohtashami avatar Sep 18 '20 13:09 akmohtashami

You might already know this, but I was browsing StackOverflow on the topic of TCP hole punching and found this paper by Prof. Bryan Ford: https://bford.info/pub/net/p2pnat/index.html.

giorgiosav avatar Sep 21 '20 15:09 giorgiosav

I found another method that could be useful and doesn't require an intermediate server: https://github.com/samyk/pwnat. It only requires that one peer knows the address of the other.

From what I understand it works like this:

  1. The "server" peer sends ICMP echo request packets to a fixed address (3.3.3.3) which won't be returned.
  2. The "client" peer sends and ICMP Time Exceeded packet to the "server" peer, containing the original packet sent by the "server" peer. The NAT lets this pass because it sees the original packet infomation inside the Time Exceeded Packet.
  3. The "server" peer at this point knows the "client" peer's IP address from the Time Exceeded packet and they can connect using hole punching, without the need for an intermediary (they have already "discovered" each other).

giorgiosav avatar Sep 30 '20 10:09 giorgiosav

is this still an issue now with peerjs #16 #17 ? (or libp2p.js?)

martinjaggi avatar Oct 09 '20 14:10 martinjaggi

Yes, this is still an issue with peerjs. However, it can be resolved using STUN/TURN servers. Me and Blagoj are working on solving this. So far, we have tested the functionality with our own peerJS server on AWS and public STUN/TURN server, but the public server is not very reliable, so many messages get dropped. Next, we will try to host our own STUN/TURN server also on AWS.

mmilenkoski avatar Oct 14 '20 21:10 mmilenkoski

adding the following info collected by @tvogels

I spent some time today exploring the space of code available related to messaging and distributed/decentralized systems. I thought I should share my notes here.

ZeroMQ is a very universal low-level messaging library. It has bindings for Python & Javascript. Good for low-latency small messages. You can get very creative with protocols. Sending large messages is not easy. You need to chunk the messages and provide flow-control (if a sender is faster than a receiver, it should stop sending until the receiver is free again) UDP beacons allow peers to find each other on a network without a coordinator. Does not work across routers or firewalls. PyTorch RPC allows quite flexible communication patterns between nodes, and seems like a nice abstraction, but assumes a constant world size and reliable peers. Non-fixed ‘worlds’ are on the roadmap. PyTorch/TensorPipe is the communication backend used by PyTorch RPC. It sends TCP messages between workers for coordination, and then finds the best protocol between them to stream large tensors (including NVLink, GPUDirect, shared memory, …). There are no good python bindings yet, so it would require some C++ coding to get a nice ‘send tensor primitive’ available in Python. If anyone wants to help with this, that would be awesome! Hivemind is a decentralized learning framework in Python. It uses a DHT for coordination between workers. It includes all-reduce and group all-reduce, which seems a bit of an anti-pattern to me (?). It’s used in Moshpit SGD. No solution for NAT traversal. gRPC: is a remote procedure call library from Google, and could be an alternative to ZeroMQ (both Python and JavaScript). It’s used by Hivemind. Chunking To send large data such files over TCP, you need to manually chunk them, and send them at a good speed. This probably applies to PeerJS, ZeroMQ, gRPC, and libp2p. I found this article informative: https://zguide.zeromq.org/docs/chapter7/ NAT Traversal I haven’t found any good-looking solutions to NAT traversal in Python. Has anyone found any? I wonder if this can be done separately by a different library than the rest of the communication? Asynchronous Python code To me, asynchronous JS code looks much cleaner than asynchronous Python code. For my personal project, I’m not targeting mobile devices. Still, I consider using JS instead of Python. Tensorflow JS on nodejs seems to be as fast as Tensorflow in Python (it’s just bindings for C++ code)

martinjaggi avatar Apr 01 '21 11:04 martinjaggi

closing as currently handled relatively ok with websockets and simplepeer for p2p #255

martinjaggi avatar Oct 25 '22 15:10 martinjaggi