DHCP proxy - no available capacity / crash when using external DHCP service
Hi, I have a few standalone containers running under podman, using a macvlan network to make them available to an internal LAN network. I'm observing the following:
1: Every time the external router assigns a DHCP host configuration to a container, netavark logs this message:
dhcp-proxy: [ERROR netavark::commands::dhcp_proxy] no available capacity
2: Every week or so I will get a hard crash on netavark with the following logs, and the container will no longer be reachable at the static IP lease addresses:
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: thread '<unnamed>' panicked at library/std/src/sys/pal/unix/stack_overflow.rs:158:13:
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: failed to set up alternative stack guard page: Out of memory (os error 12)
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: thread '<unnamed>' panicked at library/std/src/sys/pal/unix/stack_overflow.rs:154:13:
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: failed to allocate an alternative stack: Out of memory (os error 12)
Jul 11 21:29:36 build-00 daemon.err /etc/init.d/netavark-dhcp-proxy[38874]: start-stop-daemon: failed to start `/usr/libexec/podman/netavark'
Jul 11 21:29:36 build-00 user.notice dhcp-proxy: * start-stop-daemon: failed to start `/usr/libexec/podman/netavark'
I just tried building and upgrading to the latest upstream version of netavark, but I'm seeing the same log messages. I'll wait to see if I get a crash with this version.
# /opt/netavark version
{
"version": "1.12.0-dev",
"commit": "e182147b6aea964f572a4ca981bc000698d59539",
"build_time": "2024-07-12T03:39:23.741229481+00:00",
"target": "x86_64-alpine-linux-musl",
"default_fw_driver": "iptables"
}
# podman network inspect podman30
[
{
"name": "podman30",
"id": "71d03f55fc0de8a074d5e5de88759269e32da004c568f99bb94a420f2e7f31a2",
"driver": "macvlan",
"network_interface": "br30",
"created": "2024-06-23T04:40:07.724065801Z",
"ipv6_enabled": false,
"internal": false,
"dns_enabled": false,
"ipam_options": {
"driver": "dhcp"
},
"containers": {
"2852850d2afb74e1e27aa46c2ab25a887f3abb9989d41c8e9699b4fea60d3f51": {
"name": "excalidraw",
"interfaces": {
"eth0": {
"subnets": [
{
"ipnet": "172.16.30.115/24",
"gateway": "172.16.30.1"
}
],
"mac_address": "ee:b7:eb:0b:b1:2b"
}
}
}
}
}
]
/home/gopher # podman info
host:
arch: amd64
buildahVersion: 1.35.4
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
cgroupManager: cgroupfs
cgroupVersion: v2
conmon:
package: conmon-2.1.12-r0
path: /usr/bin/conmon
version: 'conmon version 2.1.12, commit: unknown'
cpuUtilization:
idlePercent: 97.52
systemPercent: 2.08
userPercent: 0.4
cpus: 24
databaseBackend: sqlite
distribution:
distribution: alpine
version: 3.20.1
eventLogger: file
freeLocks: 2031
hostname: build-00
idMappings:
gidmap: null
uidmap: null
kernel: 6.6.34-1-lts
linkmode: dynamic
logDriver: k8s-file
memFree: 29579984896
memTotal: 33634459648
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.10.0-r0
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.10.0
package: netavark-1.10.3-r0
path: /usr/libexec/podman/netavark
version: netavark 1.10.3
ociRuntime:
name: crun
package: crun-1.15-r0
path: /usr/bin/crun
version: |-
crun version 1.15
commit: e6eacaf4034e84185fd8780ac9262bbf57082278
rundir: /run/crun
spec: 1.0.0
+SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-2024.06.07-r0
version: |
pasta unknown version
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /etc/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 0
swapTotal: 0
uptime: 191h 35m 22.00s (Approximately 7.96 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 11
paused: 0
running: 11
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev
graphRoot: /media/md1/containers/storage
graphRootAllocated: 146415128576
graphRootUsed: 4874047488
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "true"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 11
runRoot: /media/md1/containers-runroot/storage
transientStore: false
volumePath: /media/md1/containers/storage/volumes
version:
APIVersion: 5.0.3
Built: 1717594599
BuiltTime: Wed Jun 5 13:36:39 2024
GitCommit: ""
GoVersion: go1.22.4
Os: linux
OsArch: linux/amd64
Version: 5.0.3
Let me know if more info is needed. Thanks!
It is likely the same cause as #811 but given your error is different we cannot be for sure. When #811 is fixed you should definitely retest.
One more crash occurrence today with latest netavark build in case it helps:
ul 15 22:16:10 build-00 user.notice dhcp-proxy: thread 'thread 'tokio-runtime-worker<unnamed>' panicked at ' panicked at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rslibrary/std/src/sys/pal/unix/stack_overflow.rs::683168::2913:
Jul 15 22:16:10 build-00 user.notice dhcp-proxy: :
Jul 15 22:16:10 build-00 user.notice dhcp-proxy: failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }failed to set up alternative stack guard page: Out of memory (os error 12)
Jul 15 22:16:10 build-00 user.notice dhcp-proxy: stack backtrace:
Jul 15 22:16:10 build-00 user.notice dhcp-proxy: memory allocation of 3072 bytes failed
Jul 15 22:16:11 build-00 daemon.err /etc/init.d/netavark-dhcp-proxy[17725]: start-stop-daemon: failed to start `/opt/netavark'
Jul 15 22:16:11 build-00 user.notice dhcp-proxy: * start-stop-daemon: failed to start `/opt/netavark'
Thanks!
Should be closed as part of https://github.com/containers/netavark/pull/1261 and https://github.com/containers/netavark/pull/1263