podman icon indicating copy to clipboard operation
podman copied to clipboard

High pasta and rootlessport CPU load

Open pwige opened this issue 1 year ago • 81 comments

Issue Description

I'm encountering unexpectedly high CPU load from pasta and rootlessport when running certain network operations. Scenario 1 – Downloading large files: Downloading Hetzner test files (although any large file download should do) from within a rootless container. Scenario 2 – Wireguard VPN server: Hosting a Wireguard server in a rootless container using the docker.io/linuxserver/wireguard image.

Steps to reproduce the issue

Steps to reproduce the issue: Scenario 1:

  1. Run rootless container and download the test file to tmpfs:
    podman container run \
        --interactive \
        --tty 
        --rm \
        --tmpfs /mnt \
        --workdir /mnt \
        debian /bin/bash -c 'apt update && apt install -y wget && wget https://ash-speed.hetzner.com/10GB.bin'
    
  2. Monitor download speed reported by wget and CPU usage reported by htop. If Hetzner's Ashburn datacenter is far from you, check their other locations and modify the URL's subdomain as needed.

Scenario 2:

  1. Create a custom network.
    podman network create testnet
    
  2. Spawn a Wireguard server in a rootless container.
    podman container run \
        --deattach \
        --rm \
        --cap-add NET_RAW \
        --cap-add NET_ADMIN \
        --sysctl net.ipv4.conf.all.forwarding=1 \
        --env PEERS=1 \
        --network=testnet \
        --publish 51820:51820/udp \
        docker.io/linuxserver/wireguard
    
  3. Install the client profile on a separate device using either the QR code provided in the server container's log or the peer1.conf file stored within the container itself.
  4. Run a speedtest first with multiple connections and then with only a single connection enabled.
  5. Observe CPU load via htop.

Describe the results you received

Scenario 1: When downloading Hetzner test files with wget in a rootless container I reach consistent download speeds of around ~43MiB/s, but htop reports the pasta process using ~40-43% CPU load.

Scenario 2: Running a network speedtest on the client using results in extremely slow network speeds reported on the client and very high CPU load on the server. htop shows 90-100% CPU load from the pasta process and 10-17% CPU load from several rootlessport processes.

Wireguard's poor performance renders it essentially unusable. I was told the CPU load is unexpectedly high and that I should raise a bug report here. Please let me know if what I am encountering is actually to be expected from rootless containers.

Describe the results you expected

I expected less severe CPU load from the rootless networking backend and better performance/network speeds via the Wireguard VPN tunnel.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.1
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-1:2.1.12-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264'
  cpuUtilization:
    idlePercent: 96.76
    systemPercent: 2.25
    userPercent: 0.99
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2046
  hostname: fern
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    - container_id: 65537
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    - container_id: 65537
      host_id: 165536
      size: 65536
  kernel: 6.10.5-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 15708397568
  memTotal: 16651931648
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.2-1
    path: /usr/lib/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.16.1-1
    path: /usr/bin/crun
    version: |-
      crun version 1.16.1
      commit: afa829ca0122bd5e1d67f1f38e6cc348027e3c32
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-2024_08_14.61c0b0d-1
    version: |
      pasta 2024_08_14.61c0b0d
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1001/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 6442446848
  swapTotal: 6442446848
  uptime: 0h 31m 3.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/containers/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /storage/containers/storage
  graphRootAllocated: 6001156685824
  graphRootUsed: 94199808
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /storage/containers/storage/volumes
version:
  APIVersion: 5.2.1
  Built: 1723672228
  BuiltTime: Wed Aug 14 17:50:28 2024
  GitCommit: d0582c9e1e6c80cc08c3f042b91993c853ddcbc6
  GoVersion: go1.23.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Running a fresh Arch installation on an Intel NUC5i7RYH, which is connected to my home's router via ethernet. No other services or containers running during testing.

Additional information

The experience I described above is consistent.

pwige avatar Aug 20 '24 19:08 pwige

Thanks for reporting this!

Scenario 2: Running a network speedtest on the client using results in extremely slow network speeds reported on the client and very high CPU load on the server. htop shows 90-100% CPU load from the pasta process and 10-17% CPU load from several rootlessport processes.

I wouldn't expect rootlessport processes to be around in this case. How many are running? Can you have a look at their command line? I'm wondering if there's some unintended loop between port mappings or suchlike.

sbrivio-rh avatar Aug 20 '24 22:08 sbrivio-rh

About Scenario 1: I can reproduce something similar, in my case downloading from the fsn1 Hetzner area. I download at approximately 1gbps (110MB/s) and pasta uses approximately 70% of a CPU thread (wget is at ~50%).

The good news is that with higher transfer rates I get, on the same system, approximately 14gbps at 100% CPU thread load, and if I CPU-starve pasta with the test file download, I get approximately the same transfer rates. That is, if we have bigger chunks of data to transfer (because we have less CPU time), the CPU load doesn't increase linearly, so I don't see a practical issue with it.

Anyway, this is what perf:

$ perf record -g ./pasta --config-net
# wget https://fsn1-speed.hetzner.com/1GB.bin -O /dev/null
--2024-08-21 10:07:11--  https://fsn1-speed.hetzner.com/1GB.bin
Resolving fsn1-speed.hetzner.com (fsn1-speed.hetzner.com)... 2a01:4f8:0:a232::2, 78.46.170.2
Connecting to fsn1-speed.hetzner.com (fsn1-speed.hetzner.com)|2a01:4f8:0:a232::2|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1073741824 (1.0G) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null           100%[===================>]   1.00G   109MB/s    in 9.4s    

2024-08-21 10:07:21 (109 MB/s) - ‘/dev/null’ saved [1073741824/1073741824]

# 
logout
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 5.080 MB perf.data (37680 samples) ]
$ perf report

says about it:

Samples: 37K of event 'cycles', Event count (approx.): 30751209003
  Children      Self  Command     Shared Object         Symbol
-   80.23%     0.56%  passt.avx2  [kernel.kallsyms]     [k] entry_SYSCALL_64_af◆
     79.67% entry_SYSCALL_64_after_hwframe                                     ▒
      - do_syscall_64                                                          ▒
         + 20.90% do_writev                                                    ▒
         + 19.63% __sys_recvmsg                                                ▒
         + 13.53% __x64_sys_epoll_wait                                         ▒
         + 5.35% ksys_read                                                     ▒
         + 4.93% __x64_sys_epoll_ctl                                           ▒
         + 4.01% __x64_sys_timerfd_settime                                     ▒
         + 3.86% __x64_sys_recvfrom                                            ▒
         + 3.29% syscall_trace_enter.constprop.0                               ▒
         + 2.75% syscall_exit_to_user_mode                                     ▒

do_writev() is the system call writing to the tap interface of the container (we have to write one frame at a time, because tap gives us a file descriptor, not a socket), and __sys_recvmsg() are the reads from the socket.

There isn't much we can do about that, except for a planned VDUSE ¹ ² back-end. At that point, we'll get rid of those system calls and just "move" data between a shared memory ring and sockets.

The rest is something we could probably try to improve on in the shorter term, for example by trying to get bigger chunks of data at a time and reducing the wakeup frequency (__x64_sys_epoll_wait() and friends). I tried this:

diff --git a/tcp_buf.c b/tcp_buf.c
index c31e9f3..c15de64 100644
--- a/tcp_buf.c
+++ b/tcp_buf.c
@@ -477,6 +477,11 @@ int tcp_buf_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn)
 		len = recvmsg(s, &mh_sock, MSG_PEEK);
 	while (len < 0 && errno == EINTR);
 
+	if (len > (1 << 16)) {
+		int lowat = 1 << 17;
+		setsockopt(s, SOL_SOCKET, SO_RCVLOWAT, &lowat, sizeof(lowat));
+	}
+
 	if (len < 0)
 		goto err;
 

CPU load is 15-18% at the same transfer rate. The transfer hangs at the end (this would need the same adaptive logic we have in tcp_splice.c), and we substantially decrease overhead from wakeups and bookkeeping:

  Children      Self  Command     Shared Object         Symbol
-   91.64%     0.12%  passt.avx2  [kernel.kallsyms]     [k] entry_SYSCALL_64_af◆
     91.53% entry_SYSCALL_64_after_hwframe                                     ▒
      - do_syscall_64                                                          ▒
         + 36.24% __sys_recvmsg                                                ▒
         + 24.94% ksys_read                                                    ▒
         + 20.90% do_writev                                                    ▒
         + 3.35% __x64_sys_epoll_wait                                          ▒
         + 1.93% __x64_sys_recvfrom                                            ▒
         + 1.08% __x64_sys_timerfd_settime                                     ▒
           0.73% syscall_exit_to_user_mode                                     ▒
           0.62% syscall_trace_enter.constprop.0                               ▒
           0.51% __x64_sys_setsockopt                                          ▒

So this is something we could consider, even though, again, that CPU load shouldn't look that scary, because it's quite adaptive.

I didn't look into Scenario 2 yet.

sbrivio-rh avatar Aug 21 '24 08:08 sbrivio-rh

I wouldn't expect rootlessport processes to be around in this case. How many are running? Can you have a look at their command line? I'm wondering if there's some unintended loop between port mappings or suchlike.

After rebooting my system to start with a clean slate, I started my Wireguard container and observed eight rootlessport and six rootlessport-child processes spawn. I spent some time on the client using the network: Opening webpages, downloading Hetzner test files, etc. After several minutes I noticed that a couple more rootlessport processes had appeared, bringing the total number to ten. All of the rootlessport and rootlessport-child processes were reaped once I stopped the container.

As an additional piece of info, I am using Quadlet (systemctl --user start wireguard) to manage this container.

# wireguard.container
[Unit]
Description=Wireguard server quadlet

[Container]
Image=docker.io/linuxserver/wireguard:latest
AddCapability=NET_ADMIN
AddCapability=NET_RAW
Sysctl=net.ipv4.conf.all.forwarding=1
Environment=PEERS=1
Environment=INTERNAL_SUBNET=10.30.0.0/24
# Volume=wireguard-config.volume:/config
Network=wireguard.network
PublishPort=51820:51820/udp

[Service]
Restart=always

[Install]
WantedBy=multi-user.target default.target

No options aside from the heading ([Network]) are provided in the wireguard.network file.

pwige avatar Aug 21 '24 19:08 pwige

jumping in to say that I'm experiencing the exact same issue, running almost an identical configuration as OP (I'm also using Quadlet to run the VPN container). Feels good not to be alone, this thing was driving me crazy. As a piece of additional information that could help zero in on this, I'll say that I've been running this configuration for more than 4 month now and the issue started only recently (the last month or so). Here's my podman info:

host:
  arch: amd64
  buildahVersion: 1.37.1
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 98.31
    systemPercent: 1.14
    userPercent: 0.55
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: cloud
    version: "40"
  eventLogger: journald
  freeLocks: 2040
  hostname: hekate
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
  kernel: 6.10.6-200.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 933859328
  memTotal: 6203207680
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.2-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240814.g61c0b0d-1.fc40.x86_64
    version: |
      pasta 0^20240814.g61c0b0d-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 6202322944
  swapTotal: 6202322944
  uptime: 24h 59m 31.00s (Approximately 1.00 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /home/podman/.config/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 4
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/podman/.local/share/containers/storage
  graphRootAllocated: 428340129792
  graphRootUsed: 24915517440
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 6
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /home/podman/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.1
  Built: 1723593600
  BuiltTime: Wed Aug 14 02:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.1

please let me know if there's anything else I can provide to help with this.

kalelkenobi avatar Aug 22 '24 10:08 kalelkenobi

I'll say that I've been running this configuration for more than 4 month now and the issue started only recently (the last month or so)

Do you happen to know starting from which version of Podman and pasta (the package is named passt) you are seeing this?

sbrivio-rh avatar Aug 22 '24 10:08 sbrivio-rh

I'll say that I've been running this configuration for more than 4 month now and the issue started only recently (the last month or so)

Do you happen to know starting from which version of Podman and pasta (the package is named passt) you are seeing this?

I started noticing the issue around the beginning of August. Looking at my DNF history I can make some educated guesses. it could be either this update (applied the 25 of July):

Packages Altered:
    Upgrade  podman-5:5.2.0~rc2-1.fc40.x86_64            @updates-testing
    Upgraded podman-5:5.1.2-1.fc40.x86_64                @@System

or this one (August 4th):

Packages Altered:
    Upgrade  aardvark-dns-2:1.12.1-1.fc40.x86_64        @updates-testing
    Upgraded aardvark-dns-2:1.11.0-3.fc40.x86_64        @@System
    Upgrade  glibc-2.39-22.fc40.x86_64                  @updates-testing
    Upgraded glibc-2.39-17.fc40.x86_64                  @@System
    Upgrade  glibc-common-2.39-22.fc40.x86_64           @updates-testing
    Upgraded glibc-common-2.39-17.fc40.x86_64           @@System
    Upgrade  glibc-gconv-extra-2.39-22.fc40.x86_64      @updates-testing
    Upgraded glibc-gconv-extra-2.39-17.fc40.x86_64      @@System
    Upgrade  glibc-langpack-en-2.39-22.fc40.x86_64      @updates-testing
    Upgraded glibc-langpack-en-2.39-17.fc40.x86_64      @@System
    Upgrade  glibc-locale-source-2.39-22.fc40.x86_64    @updates-testing
    Upgraded glibc-locale-source-2.39-17.fc40.x86_64    @@System
    Upgrade  glibc-minimal-langpack-2.39-22.fc40.x86_64 @updates-testing
    Upgraded glibc-minimal-langpack-2.39-17.fc40.x86_64 @@System
    Upgrade  libgcc-14.2.1-1.fc40.x86_64                @updates-testing
    Upgraded libgcc-14.1.1-7.fc40.x86_64                @@System
    Upgrade  libgomp-14.2.1-1.fc40.x86_64               @updates-testing
    Upgraded libgomp-14.1.1-7.fc40.x86_64               @@System
    Upgrade  libstdc++-14.2.1-1.fc40.x86_64             @updates-testing
    Upgraded libstdc++-14.1.1-7.fc40.x86_64             @@System
    Upgrade  netavark-2:1.12.1-1.fc40.x86_64            @updates-testing
    Upgraded netavark-2:1.11.0-3.fc40.x86_64            @@System
    Upgrade  podman-5:5.2.0-1.fc40.x86_64               @updates-testing
    Upgraded podman-5:5.2.0~rc2-1.fc40.x86_64           @@System

The only update I see for passt seems too recent (16th of August)

kalelkenobi avatar Aug 22 '24 10:08 kalelkenobi

my bad. There was another update on the 28 of July. Sorry I missed it.

Packages Altered:
    Upgrade       passt-0^20240726.g57a21d2-1.fc40.x86_64                   @updates-testing
    Upgraded      passt-0^20240624.g1ee2eca-1.fc40.x86_64                   @@System
    Upgrade       passt-selinux-0^20240726.g57a21d2-1.fc40.noarch           @updates-testing
    Upgraded      passt-selinux-0^20240624.g1ee2eca-1.fc40.noarch           @@System

the one before was of on June the 25th, but it feels too early and the the one after that was on August 9 (too late). hope this helps

kalelkenobi avatar Aug 22 '24 10:08 kalelkenobi

Any chance you could try downgrading to passt-0^20240624.g1ee2eca-1.fc40.x86_64 just for a quick test and see if that helps with CPU load?

sbrivio-rh avatar Aug 22 '24 11:08 sbrivio-rh

Any chance you could try downgrading to passt-0^20240624.g1ee2eca-1.fc40.x86_64 just for a quick test and see if that helps with CPU load?

I can't seem to find the old package. Not easily at least. do you know where I can get it?

kalelkenobi avatar Aug 22 '24 12:08 kalelkenobi

I managed to dowgrade to passt 0^20240326.g4988e2b-1.fc40.x86_64 and the issue seems in fact mostly resolved. I still see a lot of rootlessport processes (see below), but CPU usage has improved greatly and the server is back to being usable.

podman      1138  0.5  0.4  72424 27420 ?        Ss   14:16   0:01 /usr/bin/pasta --config-net --address 10.0.2.0 --netmask 24 --gateway 10.0.2.2 --dns-forward 10.0.2.3 --pid /run/user/1001/containers/networks/rootless-netns/rootless-netns-conn.pid -t none -u none -T none -U none --no-map-gw --quiet --netns /run/user/1001/containers/networks/rootless-netns/rootless-netns
podman      1263  0.0  0.0 1745796 5268 ?        Sl   14:16   0:00 rootlessport
podman      1273  0.0  0.0 1598332 4836 ?        Sl   14:16   0:00 rootlessport-child
podman      1377  1.0  0.0 1819784 5696 ?        Sl   14:16   0:02 rootlessport
podman      1382  0.0  0.0 1524344 4988 ?        Sl   14:16   0:00 rootlessport-child

kalelkenobi avatar Aug 22 '24 12:08 kalelkenobi

It is notably better, but I can't say for certain that it is back to "normal". I will have to invest some more time checking and testing.

kalelkenobi avatar Aug 22 '24 12:08 kalelkenobi

I can't seem to find the old package. Not easily at least. do you know where I can get it?

I'm not sure how you solved this, and for how long Fedora packages are retained, but we keep a permanent mirror of Copr builds (including Fedora packages) at https://passt.top/builds/copr/.

I managed to dowgrade to passt 0^20240326.g4988e2b-1.fc40.x86_64 and the issue seems in fact mostly resolved.

It is notably better, but I can't say for certain that it is back to "normal". I will have to invest some more time checking and testing.

Thanks, this is extremely helpful.

sbrivio-rh avatar Aug 22 '24 12:08 sbrivio-rh

Thank you! I used the mirror and I was able to downgrade to passt-0^20240624.g1ee2eca-1.fc40.x86_64. I can confirm that this version is working fine (as far as I can tell). I also did a quick test and updated back to the latest version available on Fedora 40 (passt-0^20240814.g61c0b0d-1.fc40.x86_64) and CPU usage is back to be abnormally high (60%-70% with only wireguard running). Pasta (see below) is also back to being the process with the highest CPU usage.

/usr/bin/pasta --config-net --address 10.0.2.0 --netmask 24 --gateway 10.0.2.2 --dns-forward 10.0.2.3 --pid /run/user/1001/containers/networks/rootless-netns/rootless-netns-conn.pid -t none -u none -T none -U none --no-map-gw --quiet --netns /run/user/1001/containers/networks/rootless-netns/rootless-netns

kalelkenobi avatar Aug 22 '24 13:08 kalelkenobi

I tested passt-0^20240726.g57a21d2-1.fc40.x86_64 and it looks like this is the first problematic version. I hope this helps narrow down the problem.

kalelkenobi avatar Aug 22 '24 13:08 kalelkenobi

I wouldn't expect rootlessport processes to be around in this case. How many are running? Can you have a look at their command line? I'm wondering if there's some unintended loop between port mappings or suchlike.

After rebooting my system to start with a clean slate, I started my Wireguard container and observed eight rootlessport and six rootlessport-child processes spawn. I spent some time on the client using the network: Opening webpages, downloading Hetzner test files, etc. After several minutes I noticed that a couple more rootlessport processes had appeared, bringing the total number to ten. All of the rootlessport and rootlessport-child processes were reaped once I stopped the container.

Your original reproducer didn't show you using a custom network, that uses a different architecture from the default pasta network mode, i.e. see https://github.com/containers/podman/discussions/22943#discussioncomment-9795883

So using rootlessport is normal when using custom networks. And 10-17% CPU doesn't sound particular hight to me. The process must proxy all data which is not exactly cheap.

But I do agree that pasta numbers look way to high for the amount of throughput.

Luap99 avatar Aug 22 '24 17:08 Luap99

@pwige Hi. I'm looking specifically at Scenario 2. As @sbrivio-rh notes that looks like a regression caused by the flow table, which I imlpemented.

Unfortunately, I haven't yet been able to reproduce the problem. I ran the wireguard container as described and ran a speedtest through it. While the transfer speeds are significantly slower than without the tunnel, they were still respectable (6-18 Mbps depending on which exact variant). pasta CPU usage topped out at 10-20%. I'm not seeing any rootlessport processes. There are a lot of possible differences between my setup and yours, so I don't have a quick guess as to which one is making the difference.

So, here's a grab bag of questions, hoping that something will give a clue as to what the triggering circumstances are:

  1. Can you roughly quantify the "extremely slow" speeds you're seeing. 1.1. Roughly what throughput does speedtest report through wireguard+pasta? 1.2. Roughly what throughput does it report from the same client without the wireguard tunnel? 1.3. Roughly what throughput is reported using wireguard+pasta, but with the older pasta version (passt-0^20240624.g1ee2eca-1.fc40.x86_64 or earlier)?
  2. What's the connection between the client machine and the wireguard server? Are they different containers (or VMs) on the same host? Is the client on a physically different machine on the same physical network? On an entirely different network?
  3. Do the rootlessport processes appear immediately after starting the wireguard container? Or are they only created once traffic starts?
  4. What sort of host is the wireguard container running on? In particular how many cores & threads does it have?

dgibson avatar Aug 27 '24 02:08 dgibson

@pwige Oh, sorry, one more 5.

Run a speedtest first with multiple connections and then with only a single connection enabled.

I'm not entirely sure what you mean here. I couldn't see any obvious options for multiple vs. single connections.

dgibson avatar Aug 27 '24 04:08 dgibson

@pwige Oh, sorry, one more 5.

Run a speedtest first with multiple connections and then with only a single connection enabled.

I'm not entirely sure what you mean here. I couldn't see any obvious options for multiple vs. single connections.

Sorry, somehow missed that option on the speedtest page. Found it now.

Did some further testing, and I sometimes see pasta CPU load up to 45-50%, but throughput still seems respectable.

dgibson avatar Aug 27 '24 06:08 dgibson

Do the rootlessport processes appear immediately after starting the wireguard container? Or are they only created once traffic starts?

Rootlessport is used when using the bridge mode (i.e. custom networks) and is started before the container. They handle all port forwarding traffic and in this case (so no pasta port forward option involved). The traffic that goes through pasta in this case are only the connections initiated from the container (i.e. the ones that are not a directly reply to the ip/port from rootlessport). So in scenario 2 it would very much depend on how wireguard binds/connects to the udp sockets to know how the connection is flowing.

Luap99 avatar Aug 27 '24 09:08 Luap99

I'm using version 2024_08_21.1d6142f and also experiencing very high load, my use case is a bit different.

I'm running qbittorrentofficial/qbittorrent-nox rootless and I'm observing 100% CPU load for the passt process despite setting the network concurrency settings of the program in question to lower values than default.

passt floats at about 35% CPU load for some time and then after a couple of hours hits the 100% mark.

Pausing the seeding torrents does not have an effect on the load of the process, so despite receiving no substantial traffic, the process continues to saturate its thread.

EDIT: I should've mentioned that this problem was happening using the default podman bridge network. I say this because @kalelkenobi reportedly (see later comments) only had problems using a custom network.

grtcdr avatar Aug 27 '24 22:08 grtcdr

Do the rootlessport processes appear immediately after starting the wireguard container? Or are they only created once traffic starts?

Rootlessport is used when using the bridge mode (i.e. custom networks) and is started before the container. They handle all port forwarding traffic and in this case (so no pasta port forward option involved). The traffic that goes through pasta in this case are only the connections initiated from the container (i.e. the ones that are not a directly reply to the ip/port from rootlessport). So in scenario 2 it would very much depend on how wireguard binds/connects to the udp sockets to know how the connection is flowing.

Ok, so I'm a bit confused at this point. The original instructions for reproducing don't seem to be setting up a custom network - just starting a regular container. However the presence of rootlessport seems to indicate that there is a custom network in play. @pwige can you shed some light on what network configuration you're using?

dgibson avatar Aug 28 '24 04:08 dgibson

I'm using version 2024_08_21.1d6142f and also experiencing very high load, my use case is a bit different.

I'm running qbittorrentofficial/qbittorrent-nox rootless and I'm observing 100% CPU load for the passt process despite setting the network concurrency settings of the program in question to lower values than default.

passt floats at about 35% CPU load for some time and then after a couple of hours hits the 100% mark.

Pausing the seeding torrents does not have an effect on the load of the process, so despite receiving no substantial traffic, the process continues to saturate its thread.

Huh, interesting. The good news is this strongly suggests we're doing something Just Plain Wrong (as opposed to a more subtle inefficiency), and once we find it, it should be a relatively easy fix. The bad news is that I don't yet have any good theories as to what's going on.

dgibson avatar Aug 28 '24 04:08 dgibson

Did some further testing thanks to you guys pointing out that rootless port is only needed in cases where a custom network is used and I can confirm that the problem is NOT present when using the default network (or the host network). Sorry if I've not mentioned this before, but this was out of sheer ignorance on my part. Please let me know if there's anything else I can test or information I can provide about this.

P.S. this whole thing has been hugely educational for me and has made me realize that my setup (where I have multiple containers using a custom network) is not very efficient CPU wise. Could you guys give me a pointer as to what would be a better way to have multiple containers communicate with each other? If that needs to be a longer conversation I'd be happy to post a question in discussion or do some further research myself. Thanks anyway :)

kalelkenobi avatar Aug 28 '24 08:08 kalelkenobi

@dgibson Hi!

1. Can you roughly quantify the "extremely slow" speeds you're seeing.
   1.1. Roughly what throughput does speedtest report through wireguard+pasta?
   1.2. Roughly what throughput does it report from the same client without the wireguard tunnel?

The extremely slow speeds I had previously seen were around 0.5MB/s down and 0.1MB/s up. A "clean" speedtest where I'm not connected to the vpn gives me ~450MB/s down and ~165MB/s up.

Here are two videos showing the speedtest on the client and the CPU usage on the server as they happened in real-time. The client was my laptop, which exists on the same physical network as the server.

https://github.com/user-attachments/assets/9160b14a-54cf-4ecd-8aa9-cbf06eba2531

https://github.com/user-attachments/assets/20268a45-ebc0-4b72-9b70-57aa509819a4

Despite the speeds shown here being not nearly as bad as they were previously (no complaints here), the CPU usage isn't noticeably different. I'm not sure what could account for the change in throughput other than general instability from my ISP, as I haven't changed anything on my end. It make me a bit embarrassed tbh. I haven't had the opportunity today to perform this test when connected to a different physical network, but may have the chance to tomorrow.

   1.3. Roughly what throughput is reported using wireguard+pasta, but with the older pasta version (`passt-0^20240624.g1ee2eca-1.fc40.x86_64` or earlier)?

I'm on Arch, so I can't use any of the Copr builds. I can build from source from tags 2024_06_24.1ee2eca or 2024_06_07.8a83b53. Please let me know if simply copying the resulting passt binary to /usr/local/bin is enough for testing. Thanks.

2. What's the connection between the client machine and the wireguard server?  Are they different containers (or VMs) on the same host?  Is the client on a physically different machine on the same physical network?  On an entirely different network?

The server is running in a rootless container directly on the host. The client(s) are other devices connected to the same physical network. As mentioned above, I might have the chance to test from different physical networks tomorrow.

3. Do the rootlessport processes appear immediately after starting the wireguard container?  Or are they only created once traffic starts?

The first eight appear immediately after starting the container. I can get an additional rootlessport process to spawn when I first connect a client to the wireguard vpn. The maximum I've seen is ten.

4. What sort of host is the wireguard container running on?  In particular how many cores & threads does it have?

I am using an Intel NUC5i7RYH, with the containers running directly on the host. I haven't imposed any resource limits on the container, so the it have unimpeded access to all 2 cores (4 threads) and all 16GB of RAM. The device itself is connected to my home LAN via ethernet.

pwige avatar Aug 28 '24 13:08 pwige

Do the rootlessport processes appear immediately after starting the wireguard container? Or are they only created once traffic starts?

Rootlessport is used when using the bridge mode (i.e. custom networks) and is started before the container. They handle all port forwarding traffic and in this case (so no pasta port forward option involved). The traffic that goes through pasta in this case are only the connections initiated from the container (i.e. the ones that are not a directly reply to the ip/port from rootlessport). So in scenario 2 it would very much depend on how wireguard binds/connects to the udp sockets to know how the connection is flowing.

Ok, so I'm a bit confused at this point. The original instructions for reproducing don't seem to be setting up a custom network - just starting a regular container. However the presence of rootlessport seems to indicate that there is a custom network in play. @pwige can you shed some light on what network configuration you're using?

The steps I provided to reproduce the issue don't use a custom network, something which I am doing in "production." That's an oversight on my part. Sorry! I've updated the issue's original text to reflect this.

My network configuration is defined in a Quadlet .network file containing nothing but an empty [Network] section. Here is the output of podman network inspect:

[
     {
          "name": "systemd-wireguard",
          "id": "41f1cabb2b1a194ec63a730798a6d972cba65b22699be76714cf6259558c207c",
          "driver": "bridge",
          "network_interface": "podman1",
          "created": "2024-08-27T17:36:04.729533184-04:00",
          "subnets": [
               {
                    "subnet": "10.89.0.0/24",
                    "gateway": "10.89.0.1"
               }
          ],
          "ipv6_enabled": false,
          "internal": false,
          "dns_enabled": true,
          "ipam_options": {
               "driver": "host-local"
          },
          "containers": {
               "f9bacc5a3db909b3362249fa39088241d3a842cb3d3868d04d410c6cf3fbe53d": {
                    "name": "systemd-wireguard",
                    "interfaces": {
                         "eth0": {
                              "subnets": [
                                   {
                                        "ipnet": "10.89.0.7/24",
                                        "gateway": "10.89.0.1"
                                   }
                              ],
                              "mac_address": "3a:4a:98:4d:6e:7f"
                         }
                    }
               }
          }
     }
]

pwige avatar Aug 28 '24 13:08 pwige

By the way:

I'm running qbittorrentofficial/qbittorrent-nox rootless and I'm observing 100% CPU load for the passt process despite setting the network concurrency settings of the program in question to lower values than default.

passt floats at about 35% CPU load for some time and then after a couple of hours hits the 100% mark.

Pausing the seeding torrents does not have an effect on the load of the process, so despite receiving no substantial traffic, the process continues to saturate its thread.

I tried to reproduce this case, but I couldn't, at least not yet. Perhaps it's because of the torrent content I chose (Linux distribution images): I download very fast (50-100 MB/s, and pasta reaches 40-50% load on a CPU thread) but seeding is not constantly using so much bandwidth. I see spikes in upload rates from time to time, but otherwise I'm constantly seeding at some dozen KB/s to a handful of peers.

It would be interesting if you could share the output of one second or less of strace -f -p <pasta's PID> when the CPU load is unexpectedly high (that is, when you don't expect any substantial traffic). You'll need to run strace as root, because pasta makes itself un-ptrace()able by regular users.

sbrivio-rh avatar Aug 28 '24 18:08 sbrivio-rh

@pwige, thanks for the additional information. Between your comment and the one from @kalelkenobi it certainly seems like this is triggered by the custom network setup. I'm suspecting maybe some sort of forwarding loop between pasta and rootlesskit.

I'm not really familiar with quadlet and custom network configuration though, @Luap99 any chance you could interpret @pwige's configuration into specific commands I'd need to run to create something similar?

dgibson avatar Aug 29 '24 01:08 dgibson

@sbrivio-rh discussed this and attempted a number of things. Alas we still don't have any clear reproducer.

I've been focusing on the wireguard container case. Using a podman custom network I now see rootlessport processes, but the bandwidth and cpu load didn't change notably. I do see ~40% load which is high-ish, but it didn't change appreciably between the 20240624 and current versions of pasta. I have a pretty fast CPU, but a pretty modest uplink, so it's possible I'm just not able to generate enough throughput to trigger the problem using an external server like speedtest.

One thing we realised is that we were thinking the wireguard case would be exercising pasta's UDP path, but that's not the case. With the custom network it's rootlesskit forwarding the UDP packets for the tunnel to the wireguard server; pasta is just handling the on-traffic from there to the speedtest server. I'm guessing speedtest uses TCP primarily, so all the instances here might be purely TCP related.

@kalelkenobi, I realise your case is similar to @pwige's, but I think it would help if you can describe your reproducer as precisely as you can. What are all the steps to set up your configuration, and what exactly is the difference you're seeing between the old and new pasta versions? That might help us to focus in on what's triggering the poor behaviour.

dgibson avatar Aug 29 '24 06:08 dgibson

I'm not really familiar with quadlet and custom network configuration though, @Luap99 any chance you could interpret @pwige's configuration into specific commands I'd need to run to create something similar?

Running with the custom network code path is as simple as podman run --network bridge ... as rootless

One thing we realised is that we were thinking the wireguard case would be exercising pasta's UDP path, but that's not the case. With the custom network it's rootlesskit forwarding the UDP packets for the tunnel to the wireguard server; pasta is just handling the on-traffic from there to the speedtest server. I'm guessing speedtest uses TCP primarily, so all the instances here might be purely TCP related.

Yes exactly this is what I meant above, all the port forwarding traffic goes through rootlessport and that didn't change I think.

The one thing that is missing here is how wireguard is configured, there really isn't a strict server client model so if the server config has a Peer with and Endpoint it would use that to connect rather than the incoming udp connection from rootlessport, so I could see a case where we have some for of async routing incoming via rootlesport outgoing via pasta. Given udp is stateless that really shouldn't matter but it may play a role here if that is the case. I think running tcpdump in the container may help to see what traffic is being generated, i.e. podman unshare nsenter -n$(podman inspect --format '{{.NetworkSettings.SandboxKey}}' <container name>) tcpdump -nn -i any

But yes for the simple download from within the container it should be only tcp path in pasta, though in case of the custom networks there is the extra NAT, bridge + veth setup as shown here https://github.com/containers/podman/discussions/22943#discussioncomment-9795883 but I don't think it would cause issues.

Luap99 avatar Aug 29 '24 08:08 Luap99

By the way:

I'm running qbittorrentofficial/qbittorrent-nox rootless and I'm observing 100% CPU load for the passt process despite setting the network concurrency settings of the program in question to lower values than default. passt floats at about 35% CPU load for some time and then after a couple of hours hits the 100% mark. Pausing the seeding torrents does not have an effect on the load of the process, so despite receiving no substantial traffic, the process continues to saturate its thread.

I tried to reproduce this case, but I couldn't, at least not yet. Perhaps it's because of the torrent content I chose (Linux distribution images): I download very fast (50-100 MB/s, and pasta reaches 40-50% load on a CPU thread) but seeding is not constantly using so much bandwidth. I see spikes in upload rates from time to time, but otherwise I'm constantly seeding at some dozen KB/s to a handful of peers.

It would be interesting if you could share the output of one second or less of strace -f -p <pasta's PID> when the CPU load is unexpectedly high (that is, when you don't expect any substantial traffic). You'll need to run strace as root, because pasta makes itself un-ptrace()able by regular users.

I'm having the same-ish issue with linuxserver/qbittorrent running rootless. Here is the output of strace -f -p (for about 5 seconds). log.txt

Rivers47 avatar Aug 29 '24 08:08 Rivers47