pvxs icon indicating copy to clipboard operation
pvxs copied to clipboard

UDP search traverse NAT

Open mdavidsaver opened this issue 8 months ago • 17 comments

Change the CMD_SEARCH message, adding a reply-to-sender-port flag (ReplyPort) to allow for replies to traverse a NAT.

The meaning of that flag is that the recipient should ignore the replyPort field, and instead send a reply to the apparent port which sent the request.

A forwarder should overwrite the replyPort field, ~~and clear this new ReplyPort flag~~.

This behavior is enabled by PVA protocol header version 3.

Potential TODO items for the whole change process:

  • [ ] Update protocol spec. doc https://github.com/epics-docs/epics-docs/pull/134
    • [ ] Remove pvAccessCPP protocol docs from wiki
  • [ ] Update cashark dissector: https://github.com/mdavidsaver/cashark/pull/16
  • [ ] Update core.pva
  • [ ] Update pvAccessCPP, forwarder only
  • [ ] ~~Update pvAccessJava, forwarder only~~ Kay thinks not worthwhile

mdavidsaver avatar Apr 20 '25 21:04 mdavidsaver

A possible resolution to https://github.com/epics-base/pvAccessCPP/issues/197

This protocol change would involve a protocol version increment (3). I believe that behavior added is backwards compatible since the replyPort field will always have a non-zero value, which older servers will continue to use verbatim.

Because the forwarder behavior changes, NAT traversal can only work reliable if all peers on the server/IOC host understand the new ReplyPort flag.

@gilesknap This PR implements that idea I mentioned. This should allow NAT traversal (into a container w/ slirp4netns) if both client (outside) and server (inside) are updated.

@kasemir Since this would be a protocol change, which I would want core.pva to implement as well, I would like to hear from you. As you may see from this PR, the actual code change can be small.

mdavidsaver avatar Apr 20 '25 21:04 mdavidsaver

As discussed on https://github.com/epics-base/pvAccessCPP/issues/197 the CMD_SEARCH message replyAddr and especially replyPort do not allow NAT traversal since the recipient would send a reply to the wrong port.

For the replyAddr field, the sender may set a special value (0.0.0.0 or ::0) to indicate reply to sender address. However, there is no analogous special value for replyPort. It is not possible to begin using 0 now as existing servers would blindly try to send to 0, which would break in non-NAT situations.

Instead this PR make use of one of the unused bits in the search flags field. If this new ReplyPort bit is set, then a recipient should ignore the replyPort field, and instead use to (apparent) sender/source port.

To work correctly with forwarding (aka. the local multicast hack) it is necessary that the forwarder should test ReplyPort. If set, then the forwarder should mangle the forwarded packet to clear this flag and overwrite replyPort with the apparent sender/source port.

By extension, ReplyPort must be ignored in a forwarded message as older peers will blindly forward.

mdavidsaver avatar Apr 20 '25 22:04 mdavidsaver

Looks good, but I'm unsure why only the client version is incremented to 3:

struct pva_version {
     enum {
         client = 3,
         server = 2,

Don't both server and client need to be aware of the new ReplyPort flag?

kasemir avatar Apr 21 '25 13:04 kasemir

To work correctly with forwarding (aka. the local multicast hack) it is necessary that the forwarder should test ReplyPort. If set, then the forwarder should mangle the forwarded packet to clear this flag and overwrite replyPort with the apparent sender/source port.

I don't claim to understand the protocol, will this change permit search packets to pass through multiple forwarders? The "forwarder should … clear the flag" wording above makes it seem like it would only work through one, which if true could be short-sighted — won't searches from a client in a container on one host have to pass through 2 forwarders to find PVs on a server inside a container on a different host?

anjohnson avatar Apr 21 '25 15:04 anjohnson

The forwarding is local. Within one host, assume you have N servers. The one started last happens to receive the unicast UDP traffic. It forwards it to all the other N-1 servers on the host, and if one of them recognizes the searched channel, it can then reply. For that it should be sufficient for the forwarder to once put the correct reply port into the message.

kasemir avatar Apr 21 '25 15:04 kasemir

The forwarding is local.

Ah, so the forwarder is only used when copying the packet to the localhost interface for other servers running on the same host (which presumably means running in the same container when containers are involved). Thanks!

anjohnson avatar Apr 21 '25 16:04 anjohnson

The forwarding is local.

What @kasemir says is entirely correct so far. @anjohnson you do bring up a point which might become relevant in future.

In the context of a container with user-mode networking, there are actually two loopback interfaces for a SEARCH to potentially pass through. The host loopback, and the container/guest loopback. At present this is not relevant since slirp4netns and its ilk are not aware of SO_REUSEPORT, which permits multiple UDP sockets to bind to one port.

Thus, so far running a single container with user-mode networking bound to UDP/5076 prevents any other container, and any PVA peer (eg. pvget), from running on that host.

So what would happen if some clever person figures out a way around this limitation?

"forwarder should … clear the flag"

As I think about it, clearing this bit is not necessary. And allowing it to pass through could enable this future possibility.

mdavidsaver avatar Apr 22 '25 00:04 mdavidsaver

two loopback interfaces for a SEARCH to potentially pass through.

I say "potentially" here because of some oddities with how slirp4netns works at present. For example, a broadcast received by the host socket magically becomes a unicast inside ...without setting the PVA SEARCH unicast flag. Another example of why this flag can not be trusted. (cf. https://github.com/epics-base/pvAccessCPP/issues/200)

As I think about it, this PR may be better combined with #101, which avoids testing the unicast flag.

@gilesknap tells me that podman 5 replaces slirp4netns with something else, which does not appear to NAT. It remains to be seen if/how it mangles the destination IP address.

mdavidsaver avatar Apr 22 '25 00:04 mdavidsaver

why only the client version is incremented to 3:

Because I can? Which is not the same as should...

Don't both server and client need to be aware of the new ReplyPort flag?

The loopback UDP forwarding behavior blurs the client/server distinction since either type of peer may do the forwarding. To achieve forward compatibility it is necessary for a peer to forward messages with later versions it does not understand.

fyi. With PVXS each process gets only one forwarder per port. For example, a process with one client and one server bound to the same port will only have one forwarder.

Also, so far (I think) no implementation tries to forward unicast BEACON, which I suppose is a surprise waiting for some site.

mdavidsaver avatar Apr 22 '25 00:04 mdavidsaver

@mdavidsaver I have tried this version of PVXS in the same test environment we used at the end of the EPICS meeting.

It does not appear to be working for me, it just times out as before. If I run the IOC container with network host it does work, but just binding 5075/tcp 5076/udp is still timing out.

I enclose minimal steps to reproduce my result so you can tell me what I am doing wrong or potentially do some diagnosis.

testing pvxs in containers

Server Machine

  • clone example IOC instances next to pvxs-dev folder
  • launch an ioc-generic container and mount in the PR version of pvxs over the version that was compiled at build time
$ podman run --security-opt label=disable \
    -p5064:5064/udp -p5064:5064/tcp -p5075:5075/tcp -p5076:5076/udp \
    -itv $(pwd)/pvxs-dev:/epics/support/pvxs \
    -v $(pwd)/example-services/services/bl01t-ea-test-01/config/:/epics/ioc/config \
    --rm ghcr.io/epics-containers/ioc-generic-developer:2025.4.2b1
  • Launch the example ioc
/epics/ioc/start.sh

Client Machine

  • launch ioc-generic with PR version of pvxs and network host
$ podman run --security-opt label=disable \
    --network host \
    -itv $(pwd)/pvxs-dev:/epics/support/pvxs \
    --rm ghcr.io/epics-containers/ioc-generic-developer:2025.4.2b1
  • try to get the pv
/epics/support/pvxs/bin/linux-x86_64/pvxget bl01t:SUM
  • note that caget works - also starting the Server container with network host works.

Manual edits to the example IOC. 

(To get the epics-containers framework out of the way)

After the first execution of start.sh  a completely standard startup script can be found and edited using vim at /epics/runtime/st.cmd and executed using:

/epics/ioc/bin/linux-x86_64/ioc /epics/runtime/st.cmd 

gilesknap avatar Apr 22 '25 12:04 gilesknap

@EmilioPeJu @coretl FYI you may be interested in this PR.

gilesknap avatar Apr 22 '25 13:04 gilesknap

This PR now includes the contents of #101 as well as the ReplySrcPort flag (renamed from ReplyPort).

I also found a way to test with a single host. With softIocPVX running in a container with user mode networking and -p5075:5075/tcp -p5076:5076/udp, then run on the host:

EPICS_PVA_AUTO_ADDR_LIST=NO \
EPICS_PVA_ADDR_LIST=localhost:5076 \
EPICS_PVA_BROADCAST_PORT=15076 \
./bin/linux-x86_64/pvxget pv:name

Note that localhost:5076 can be replaced with 127.255.255.255:5076 or some <iface-addr>:5076 as #101 is able to distinguish a localhost multicast from a unicast or bcast.

mdavidsaver avatar Apr 26 '25 02:04 mdavidsaver

@mdavidsaver I've tried the latest changes against:

  • podman 4.9.4
  • podman 5.0.3 with bridge network

and it is working.

gilesknap avatar Apr 26 '25 10:04 gilesknap

@mdavidsaver I've tried the latest changes against:

  • podman 4.9.4
  • podman 5.0.3 with bridge network

and it is working.

Also @EmilioPeJu confirms that it is working fully for docker too (I had some issues with docker not related to this PR)

gilesknap avatar Apr 28 '25 09:04 gilesknap

@mdavidsaver is this ready to release?

gilesknap avatar Jun 02 '25 10:06 gilesknap

@mdavidsaver is this ready to release?

The code is, but the protocol change needs to be communicated properly. Since @kasemir is onboard, the next step is to reconcile the spec. documents (yes, plural).

@kasemir @ttkorhonen Do you have thoughts on what to do with the copies in the pvAccessCPP wiki and/or the epics-docs repository? Are there any other copies I am not thinking of?

My current thinking is to make https://github.com/epics-docs/epics-docs the authoritative copy going forward, and replace the wiki content with stub links to it.

mdavidsaver avatar Jun 09 '25 00:06 mdavidsaver

@mdavidsaver is this ready to release?

The code is, but the protocol change needs to be communicated properly. Since @kasemir is onboard, the next step is to reconcile the spec. documents (yes, plural).

@kasemir @ttkorhonen Do you have thoughts on what to do with the copies in the pvAccessCPP wiki and/or the epics-docs repository? Are there any other copies I am not thinking of?

My current thinking is to make https://github.com/epics-docs/epics-docs the authoritative copy going forward, and replace the wiki content with stub links to it.

@mdavidsaver , I agree with your proposal. I am obviously biased because I imported the document to epics-docs, after some discussions. I think that making it the authoritative copy would be the right decision. Version control is easier than in wiki. I am not aware of more copies, or at least do not remember any on top of my head.

ttkorhonen avatar Jun 09 '25 07:06 ttkorhonen