Container with podman network not receiving UDP traffic
Issue Description
Upon running a simple python server container listening on a UDP socket with an attached podman network, UDP traffic that is being sent to the port does not arrive.
Versions 5.2.0-dev-5d10f77da and 4.9.4-rhel both were tried with the same results.
This is a MRE of the issue we are having in production. Docker is fine, podman+cni is fine, podman+netavark exhibits this issue. Note restarting our UDP devices or changing the source port is very cumbersome and we wish to avoid this.
Steps to reproduce the issue
Steps to reproduce the issue
- Create Dockerfile
FROM python:latest
WORKDIR /usr/local/bin
COPY server.py .
CMD ["chmod", "+x", "server.py"]
CMD ["server.py"]
- The corresponding server script
#!/bin/python3
import socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
server_socket.bind(('', 17000))
while True:
message, address = server_socket.recvfrom(1024)
print(f"resceived from: {address}: {message}", flush = True)
-
podman build . -t podman_udp_test -
podman network create podman_udp - Start sending UDP traffic to port 17000 etc with nping:
nping -g 17580 -p 17000 -c 1000000 --udp 127.0.0.1 -
podman run -p 17000:17000/udp --net podman_udp_network podman_udp_testing
Describe the results you received
No output
Describe the results you expected
Output from the server after receiving packets
podman info output
host:
arch: amd64
buildahVersion: 1.33.8
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.10-1.el9.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: fb8c4bf50dbc044a338137871b096eea8041a1fa'
cpuUtilization:
idlePercent: 99.38
systemPercent: 0.28
userPercent: 0.35
cpus: 4
databaseBackend: sqlite
distribution:
distribution: rhel
version: "9.4"
eventLogger: journald
freeLocks: 2032
hostname: ccms-pod
idMappings:
gidmap: null
uidmap: null
kernel: 5.14.0-427.18.1.el9_4.x86_64
linkmode: dynamic
logDriver: journald
memFree: 640634880
memTotal: 8058433536
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.10.0-3.el9_4.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.10.0
package: netavark-1.10.3-1.el9.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.10.3
ociRuntime:
name: crun
package: crun-1.14.3-1.el9.x86_64
path: /usr/bin/crun
version: |-
crun version 1.14.3
commit: 1961d211ba98f532ea52d2e80f4c20359f241a98
rundir: /run/user/0/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
exists: false
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.3-1.el9.x86_64
version: |-
slirp4netns version 1.2.3
commit: c22fde291bb35b354e6ca44d13be181c76a0a432
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 5367644160
swapTotal: 5368705024
uptime: 583h 32m 27.00s (Approximately 24.29 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 5
paused: 0
running: 5
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 47173337088
graphRootUsed: 20552769536
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 34
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.9.4-rhel
Built: 1719829634
BuiltTime: Mon Jul 1 18:27:14 2024
GitCommit: ""
GoVersion: go1.21.11 (Red Hat 1.21.11-1.el9_4)
Os: linux
OsArch: linux/amd64
Version: 4.9.4-rhel
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
There was no difference from running nping on localhost versus running it on a different machine that can access the podman container
Additional information
Starting the python server first and then starting the UDP sender works as expected but this doesn't help our use case.
Stopping and restarting the UDP sender program while the container is running doesn't help. Only by changing the source port of the UDP sender program does traffic start being received, but we cannot easily change the source port of the UDP traffic.
This is likely because we do not change any conntack entries in netavark. We must call into the kernel netlink API to drop the stale entries and last I check our netlink did not have any support for conntack types so we would need to implement the types from scratch which is a lot of work. In any case this is a netavark issue so I move it there.
Note if you are RHEL user it is best to report this through the Red Hat support channels so this can get better prioritized.
Is there a work around possible?
manually clear conntrack entries (assuming that is actually causing the issue you are having)
I am having this same issue after restarting a pod that uses quadlets (systemctl restart app-pod.service).
I was able to work around it by manually clearing the conntrack entries as suggested.
conntrack -L conntrack | grep 514
conntrack -D conntrack --proto udp --orig-src 192.168.20.1 --orig-dst 192.168.20.2 --sport 514 --dport 5141
Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?
I am having this same issue after restarting a pod that uses quadlets (
systemctl restart app-pod.service).I was able to work around it by manually clearing the conntrack entries as suggested.
conntrack -L conntrack | grep 514 conntrack -D conntrack --proto udp --orig-src 192.168.20.1 --orig-dst 192.168.20.2 --sport 514 --dport 5141Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?
Manually clearing the conntrack entries was an acceptable workaround for us in production. Otherwise I cannot offer any further information about this. Sorry
Any way I can help troubleshoot why this is happening, and help fix it, so this work around isn't required?
It happens because the kernel keeps the conntrack kernel around for a while. Not sure on the time but it is not important.
What needs to happen is for netavark to learn how to flush these entries on setup/teardown. And this requires us to talk to the proper kernel APIs like conntrack does. Calling the conntrack command from netavark would not seem acceptable to me.
Does the netavark team have any desire to actually fix this issue? This is plaguing my environment as I rely heavily on SNMP collection (traps) as well as Syslog via containers and with the push to use Podman on RHEL and CNI being dropped in favor of Netavark but UDP not working... this seems oddly low priority for yall.
This has never made it the top priority in our planning so at least in the near future we have no plans to work on this.
That doesn't mean that I don't want to see this fixed but so far at least other work items have been seen as more important. Anyone can take on the work to fix this if they want, we happily accept contributions.
And in case you are a RHEL customer file a support request asking for this feature/fix. The more customers that ask for it the more likely it is that it gets ranked higher on our list.
Hi @booleanvariable !! I am interested to contribute to this project in LFX. Can you recommend some good first issues to work on to contribute to code base and understand the project better ?
Hey @Luap99 & @mheon I would like to be part of this project under lfx term 3 can u please help me start contributing so that I get a deep knowledge about the codebase I have great knowledge of c language and also contributing and connecting with cncf project @cilium networking project so I am familiar with ip or networking concepts clearly! talking about rust just explored the basics not yet deep dived but I am very eager to have some guidance to improve my rust skills genuinely asking ! So please help me to get forward with code contribution! Thx and any channel like slack so we can have a conversation when needed ! ?
Hi @booleanvariable , I’m interested in contributing to this project through the LFX mentorship program. I’ve gone through the repository and would love to start contributing. Could you kindly suggest some good first issues or beginner-friendly tasks that I can work on to get started?
Hi,
Is implementing BPF (Berkeley Packet Filter) as in-kernel filtering mechanism for conntrack events in the scope here? My reading of the issue makes me think the goal is just to directly delete entries via netlink.
Why does netavark use netlink-sys instead of netlink-proto which is an asynchronous implementation of the netlink protocol? ref. Is there a specific design reason or history there?
For context, I'm asking because I'm very interested in implementing this for the LFX mentorship program. I've sent a more detailed plan to @mheon and @Luap99 by email for their review.
Thanks
Sorry for delay all, please apply through the official portal if you are interested, I think the applications should open later today. https://mentorship.lfx.linuxfoundation.org/project/07efb861-3c5b-4bc2-9986-593656750ffc
As for easy issues it is hard gor me to judge, https://github.com/containers/netavark/issues/1258 might be one.
Is implementing BPF (Berkeley Packet Filter) as in-kernel filtering mechanism for conntrack events in the scope here? My reading of the issue makes me think the goal is just to directly delete entries via netlink.
We don't want a persistent process, we really only aim for flushing/deleting the entry for the given container port so the udp traffic will be redirected accordingly.
Why does netavark use netlink-sys instead of netlink-proto which is an asynchronous implementation of the netlink protocol? ref. Is there a specific design reason or history there?
Async is not a good fit for us because all we do is synchronous anyway for the most part, so the async code would just look make async call/ wait for async call to finish which in itself just adds the overhead of the event loop the async runtime uses. In fact we had used the async parts in the past https://github.com/containers/netavark/commit/96993f4f94a04b079f083650a8c5767c8f5c5fb3