Podman network interfaces not properly deleted during containerlab destroy

Open FloSch62 opened this issue 10 months ago • 1 comments

When using containerlab with the podman runtime, network interfaces aren't being properly cleaned up after destroying a lab. This causes subnet conflicts when attempting to redeploy the same topology using the default network.

clab@FloSch:~$ containerlab -r podman deploy -c -t .clab/srl01/srl01.clab.yml 
19:21:21 INFO Containerlab started version=0.68.0
19:21:21 INFO Parsing & checking topology file=srl01.clab.yml
19:21:21 INFO Removing directory path=/home/clab/.clab/srl01/clab-srl01
19:21:21 INFO Creating lab directory path=/home/clab/.clab/srl01/clab-srl01
19:21:21 INFO Running postdeploy actions kind=nokia_srlinux node=srl2
19:21:21 INFO Created link: srl1:e1-1 ▪┄┄▪ srl2:e1-1
19:21:21 INFO Running postdeploy actions kind=nokia_srlinux node=srl1
19:21:31 INFO Adding host entries path=/etc/hosts
19:21:31 INFO Adding SSH config for nodes path=/etc/ssh/ssh_config.d/clab-srl01.conf
╭─────────────────┬──────────────────────────────┬─────────┬───────────────────╮
│       Name      │          Kind/Image          │  State  │   IPv4/6 Address  │
├─────────────────┼──────────────────────────────┼─────────┼───────────────────┤
│ clab-srl01-srl1 │ nokia_srlinux                │ running │ 172.20.20.23      │
│                 │ ghcr.io/nokia/srlinux:latest │         │ 3fff:172:20:20::f │
├─────────────────┼──────────────────────────────┼─────────┼───────────────────┤
│ clab-srl01-srl2 │ nokia_srlinux                │ running │ 172.20.20.22      │
│                 │ ghcr.io/nokia/srlinux:latest │         │ 3fff:172:20:20::e │
╰─────────────────┴──────────────────────────────┴─────────┴───────────────────╯
(reverse-i-search)`': ^C
clab@FloSch:~$ containerlab -r podman destroy -c -t .clab/srl01/srl01.clab.yml 
19:21:55 INFO Parsing & checking topology file=srl01.clab.yml
19:21:55 INFO Parsing & checking topology file=srl01.clab.yml
19:21:55 INFO Destroying lab name=srl01
19:21:56 INFO Removing host entries path=/etc/hosts
19:21:56 INFO Removing SSH config path=/etc/ssh/ssh_config.d/clab-srl01.conf
clab@FloSch:~$ containerlab -r podman deploy -c -t .clab/srl01/srl01.clab.yml 
19:21:58 INFO Containerlab started version=0.68.0
19:21:58 INFO Parsing & checking topology file=srl01.clab.yml
19:21:58 INFO Removing directory path=/home/clab/.clab/srl01/clab-srl01
Error: subnet 172.20.20.0/24 is already used on the host or by another config

When attempting to redeploy, containerlab fails with:

Error: subnet 172.20.20.0/24 is already used on the host or by another config

The debug logs show containerlab attempting to delete the management network:

20:26:41 DEBU Calling DeleteNet method. *CLab.Config.Mgmt value is: &{Network:clab Bridge:podman1 IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:3fff:172:20:20::/64 IPv6Gw: IPv6Range: MTU:0 ExternalAccess:0xc00057c4cf}
20:26:41 DEBU Method DeleteNet was called with runtime inputs &{config:0xc0001169d8 mgmt:0xc00015de60} and net settings &{Network:clab Bridge:podman1 IPv4Subnet:172.20.20.0/24 IPv4Gw: IPv4Range: IPv6Subnet:3fff:172:20:20::/64 IPv6Gw: IPv6Range: MTU:0 ExternalAccess:0xc00057c4cf}
20:26:41 DEBU trying to delete mgmt network clab

However, despite this attempt, the network interfaces remain:

61: br-017aa95cb641: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    inet 172.20.20.1/24 brd 172.20.20.255 scope global br-017aa95cb641
       valid_lft forever preferred_lft forever
... or
78: podman1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    inet 172.20.20.1/24 brd 172.20.20.255 scope global podman1
       valid_lft forever preferred_lft forever

However the the clab nework is gone:

clab@FloSch:~/.clab/srl01$ sudo podman network ls
NETWORK ID    NAME        DRIVER
2f259bab93aa  podman      bridge

The DeleteNet method in the podman runtime implementation doesn't appear to be fully cleaning up network interfaces. While containerlab correctly identifies the management network and bridge interface, the actual deletion of these resources is not completing successfully.

May 07 '25 18:05 FloSch62

The problem didn't occur when I were deploying Nokia SROS using Podman runtime. On the other hand, I can reproduce this issue when I tried to deploy Nokia SRL using Podman runtime.

Moreover, the same problem also occurred when I deploying Nokia SRL using podman compose approach. I used below compose.yaml for the container composition.

services:
  node1:
    image: ghcr.io/nokia/srlinux:24.10.4
    privileged: true
    tty: true
    networks:
      - internal
networks:
  internal:

Below is the snapshot error and warning which appeared when I executed sudo podman compose down.

>>>> Executing external compose provider "/bin/podman-compose". Please see podman-compose(1) for how to disable this message. <<<<

podman-compose version: 1.0.6
['podman', '--version', '']
using podman version: 5.2.2
** excluding:  set()
podman stop -t 10 compose-test_node1_1
WARN[0010] StopSignal SIGTERM failed to stop container compose-test_node1_1 in 10 seconds, resorting to SIGKILL
ERRO[0010] Unable to clean up network for container 14dc3bc2ea240e87d3f9b96d24ec971efa14a09e1a012f100a2f836510acc468: "netavark: failed to delete container veth eth0: Netlink error: No such device (os error 19)"
compose-test_node1_1
exit code: 0
podman rm compose-test_node1_1
compose-test_node1_1
exit code: 0

It seems Podman cannot clean up the network of SRLinux instance. The virtual bridge interface were still lingering after I executed sudo podman compose down and sudo podman network rm.

111: podman1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 8a:49:14:7f:e3:04 brd ff:ff:ff:ff:ff:ff
    inet 10.89.0.1/24 brd 10.89.0.255 scope global podman1
       valid_lft forever preferred_lft forever
    inet6 fe80::8849:14ff:fe7f:e304/64 scope link
       valid_lft forever preferred_lft forever

May 18 '25 08:05 gusman