podman Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta

Issue Description

Hi, Since the upgrade to Fedora Silverblue 40 / Podman 5, systemd fail to launch containers at boot. If I try to launch them manually through systemctl --user start container.service, it works as expected. Thanks you!

Steps to reproduce the issue

Automatize the gestion of container through quadlet / ~/.config/containers/systemd files
Restart the server and see that containers failed to launch

Describe the results you received

Containers doesn't launch at boot, needs to be started manually

Describe the results you expected

Containers should start at boot.

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.1
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-4.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 99.37
    systemPercent: 0.21
    userPercent: 0.42
  cpus: 32
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: silverblue
    version: "40"
  eventLogger: journald
  freeLocks: 2047
  hostname: homeserver
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1020
      size: 1
    - container_id: 1
      host_id: 1703936
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1020
      size: 1
    - container_id: 1
      host_id: 1703936
      size: 65536
  kernel: 6.8.1-300.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 64334761984
  memTotal: 67334115328
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-3.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1020/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240320.g71dd405-1.fc40.x86_64
    version: |
      pasta 0^20240320.g71dd405-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1020/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 146028879872
  swapTotal: 146028879872
  uptime: 0h 14m 2.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/srv/media-server/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /srv/media-server/.local/share/containers/storage
  graphRootAllocated: 3999065440256
  graphRootUsed: 1034920087552
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 14
  runRoot: /run/user/1020/containers
  transientStore: false
  volumePath: /var/srv/media-server/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.0
  Built: 1710806400
  BuiltTime: Tue Mar 19 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Fedora Silverblue 40 up-to-date

Additional information

Logs of a container :

mars 28 12:15:09 homeserver jellyfin[7039]: Error: pasta failed with exit code 1: mars 28 12:15:09 homeserver jellyfin[7039]: External interface not usable

Mar 28 '24 11:03 Froggy232

You have to make sure your network is fully set up before the unit is started.

Mar 28 '24 12:03 Luap99

This feel like it could be related to the same question in https://github.com/containers/podman/pull/22057

Mar 29 '24 11:03 rhatdan

I have not been able to get a rootless user quadlet to wait for my network to be ready even adding

[Unit]
wants=nss-online.target
after=nss-online.target

No issues on 4.9.3

Mar 29 '24 20:03 flyingfishflash

@flyingfishflash You cannot wait for system units from user units, see https://github.com/systemd/systemd/issues/3312

I wasn't aware that the user units start before the network is fully set up and that it causes such big trouble with pasta. Note you do not need to downgrade, you can just change the default back to slirp4netns in containers.conf, see the last part in the pasta section on https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/

You could also do something like this https://github.com/containers/podman/issues/22190#issuecomment-2027257771

Of course none of this is a proper solution but I am sure we will find something to address this in a better way soon.

Mar 29 '24 20:03 Luap99

@Luap99 - thank you for this tip re containers.conf!

Mar 29 '24 22:03 flyingfishflash

You could also do something like this #22190 (comment)

No. It's as much of a bad practice today as it was 50 years ago.

Apr 12 '24 14:04 gdonval

I ran into this issue today and finally learned that systemd user level units apparently can't depend on system level units (such as network-online.target)

I've managed a workaround that satisfies my desire to avoid arbitrary timeouts by creating a user-level network-online.service and network-online.target

# ~/.config/systemd/user/network-online.service
[Unit]
Description=User-level proxy to system-level network-online.target

[Service]
type=oneshot
ExecStart=/bin/bash -c 'until systemctl --machine=%[email protected] is-active network-online.target; do sleep 1; done'

[Install]
WantedBy=default.target

# ~/.config/systemd/user/network-online.target
[Unit]
Description=User-level network-online.target
Requires=network-online.service
Wants=network-online.service
After=network-online.service

Then in your quadlet units:

[Unit]
After=network-online.target

Apr 25 '24 20:04 Klowner

seems it just work after you can ping an external ip (include gateway ip)

Apr 27 '24 03:04 soiamsoNG

I'll share my workaround, but it might be a good idea to have a podman network --health command to verify by driver and network and such.

#[Unit]
Description=Wait for network to be online via NetworkManager or Systemd-Networkd

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
# At least one of these should work depending if using NetworkManager or Systemd-Networkd
ExecStart=/bin/bash -c ' \
    if command -v nm-online &>/dev/null; then \
        nm-online -s -q; \
    elif command -v /usr/lib/systemd/systemd-networkd-wait-online &>/dev/null; then \
        /usr/lib/systemd/systemd-networkd-wait-online; \
    else \
        echo "Error: Neither nm-online nor systemd-networkd-wait-online found."; \
        exit 1; \
    fi'
ExecStartPost=ip -br addr
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit <THIS SERVICE NAME>`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=default.target

May 06 '24 21:05 djarbz

Another workaround:

We can copy network-online.target from system to user, with a little modify, like this:

$ cat /etc/systemd/user/network-online.target
[Unit]
Description=Network online for systemd --user
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
#After=network.target

$ cat /etc/systemd/user/systemd-networkd-wait-online.service
[Unit]
Description=Wait network online for systemd --user
Documentation=man:systemd-networkd-wait-online.service(8)
Before=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes

[Install]
WantedBy=network-online.target

or you can put these files to ~/.config/systemd/user for only one user.

Then enable the service as a user:

$ systemctl --user enable systemd-networkd-wait-online.service

Finally we can wait network online for podman, like this:

$ cat ~/.config/containers/systemd/my-app.container
[Unit]
Wants=network-online.target
After=network-online.target

reference link: https://unix.stackexchange.com/questions/216919/how-can-i-make-my-user-services-wait-till-the-network-is-online

Jun 23 '24 23:06 secext2022

Hi,

Any idea for a workaround when using NetworkManager?

I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the -s, but the behavior is still the same.

dog /etc/systemd/user/network-online.target:

#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
# After=network.target

/etc/systemd/user/NetworkManager-wait-online.service:

[Unit]
Description=Network Manager Wait Online for Users
Documentation=man:NetworkManager-wait-online.service(8)
Requires=NetworkManager.service
After=NetworkManager.service
Before=network-online.target

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
ExecStart=/usr/bin/nm-online -q
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit NetworkManager-wait-online`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=network-online.target

journalctl -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online...
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online.
Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.

The above is the system log, 12:43:09 is the user service. As the user running the podman container, LANG=C journalctl --user -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.

Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:

systemctl --user status NetworkManager-wait-online.service 
○ NetworkManager-wait-online.service - Network Manager Wait Online for Users
     Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:NetworkManager-wait-online.service(8)

As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround: ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'

Jul 17 '24 10:07 WildPenquin

I haven't used /etc/systemd/user, but my unit works, at least I haven't noticed an issue, when placed in ~/.config/Systemd/user.

Jul 17 '24 10:07 djarbz

Hi,

Any idea for a workaround when using NetworkManager?

I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the -s, but the behavior is still the same.

dog /etc/systemd/user/network-online.target:

#  SPDX-License-Identifier: LGPL-2.1-or-later
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
# After=network.target

/etc/systemd/user/NetworkManager-wait-online.service:

[Unit]
Description=Network Manager Wait Online for Users
Documentation=man:NetworkManager-wait-online.service(8)
Requires=NetworkManager.service
After=NetworkManager.service
Before=network-online.target

[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.

Type=oneshot
ExecStart=/usr/bin/nm-online -q
RemainAfterExit=yes

# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit NetworkManager-wait-online`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60

[Install]
WantedBy=network-online.target

journalctl -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online...
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online.
Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.

The above is the system log, 12:43:09 is the user service. As the user running the podman container, LANG=C journalctl --user -b0 | grep Online:

Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.

Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:

systemctl --user status NetworkManager-wait-online.service 
○ NetworkManager-wait-online.service - Network Manager Wait Online for Users
     Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:NetworkManager-wait-online.service(8)

As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround: ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'

@WildPenquin

Please check this in the container service:

[Unit]
Wants=network-online.target
After=network-online.target

Jul 18 '24 00:07 secext2022

$ systemctl --user status my-app.service
● my-app.service - example deno/fresh app
     Loaded: loaded (/var/home/fc-test/.config/containers/systemd/my-app.container; generated)
    Drop-In: /usr/lib/systemd/user/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Wed 2024-07-17 04:21:49 UTC; 20h ago
   Main PID: 2026 (conmon)

$ systemctl --user list-dependencies my-app
my-app.service
● ├─app.slice
● ├─basic.target
● │ ├─systemd-tmpfiles-setup.service
● │ ├─paths.target
● │ ├─sockets.target
● │ │ └─dbus.socket
● │ └─timers.target
● │   └─systemd-tmpfiles-clean.timer
● └─network-online.target
●   └─systemd-networkd-wait-online.service

Jul 18 '24 00:07 secext2022

Hi @secext2022 ,

The Unit section is defined correctly.

As per my log, the problem is that NetoworkManager-wait-online user service finishes much too soon, much sooner that the system level one. I believe (meaning I'm not sure) that nm-online does not work correctly when run as a user (not designed to be run as a user?).

As yet another workaround, I've added ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;' under [Service]. On the TODO list, I'm going to test if this works correctly if I change my interface to be managed by systemd-networkd with and use the systemd-networkd-wait-online service instead.

$ systemctl --user status pande-pmc.service

● pande-pmc.service - PandESportS MC-serveri
     Loaded: loaded (/home/minecraft/.config/containers/systemd/pande-pmc.container; generated)
     Active: active (running) since Fri 2024-07-19 16:04:56 EEST; 4min 55s ago
 Invocation: 9858022ff77a4dd38327d8c513324e7d
    Process: 829 ExecStartPre=/bin/sh -c until ping -c1 192.168.66.6; do sleep 1; done; (code=exited, status=0/SUCCESS)
   Main PID: 906 (conmon)
      Tasks: 82 (limit: 28525)
     Memory: 6.2G (peak: 6.2G)
        CPU: 1min 6.141s

$ systemctl --user list-dependencies pande-pmc.service

pande-pmc.service
● ├─app.slice
● ├─basic.target
● │ ├─paths.target
● │ ├─sockets.target
● │ │ ├─dbus.socket
● │ │ ├─dirmngr.socket
● │ │ ├─drkonqi-coredump-launcher.socket
● │ │ ├─gpg-agent-browser.socket
● │ │ ├─gpg-agent-extra.socket
● │ │ ├─gpg-agent-ssh.socket
● │ │ ├─gpg-agent.socket
● │ │ ├─keyboxd.socket
● │ │ ├─p11-kit-server.socket
● │ │ ├─pipewire-pulse.socket
● │ │ └─pipewire.socket
● │ └─timers.target
○ │   ├─drkonqi-coredump-cleanup.timer
○ │   └─drkonqi-sentry-postman.timer
● └─network-online.target
○   └─NetworkManager-wait-online.service

config/containers/systemd/pande-pmc.container:

[Unit]
Description=PandESportS MC-serveri

After=network-online.target
Wants=network-online.target


[Container]
AutoUpdate=registry
ContainerName=PandEPMC
Image=docker.io/gameservermanagers/gameserver:pmc
Volume=pandepmc:/data
LogDriver=k8s-file
PublishPort=25560:25560/tcp
PublishPort=25560:25560/udp
PodmanArgs=--log-opt=path=/home/minecraft/PandEPMClog.k8s
Timezone=local

[Service]
ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;'
# Restart=always
Restart=no

[Install]
WantedBy=multi-user.target default.target

Jul 19 '24 13:07 WildPenquin

After reading this thread and also the comments in https://github.com/systemd/systemd/issues/3312 , I think that thread has much cleaner workarounds than many of the ones in this thread. The problems with the workaround in here are that they are often quite long and convoluted for this relatively simple issue, and may or will break if the system configuration changes, as they are not agnostic on the configuration. But the systemd issue has much cleaner and simpler workarounds:

Make the whole user@UID service depend on network-online (https://github.com/systemd/systemd/issues/3312#issuecomment-2039973852) - but read the whole comment for caveats! This will work if you have a dedicated user for running containers which are useless without a network (so the caveats don't matter). Rename the [email protected] to include the UID to not enable this for all users. 3 lines, changing user@ service.
Make one user service which checks the system level network-online.target (https://github.com/systemd/systemd/issues/3312#issuecomment-2185399471). Then make quadlets depends on this service. This is ~~4 lines of code~~ one simple service file which should work as long as system network-online.target is configured properly. You could replace the systemctl is-active with a ping to your GW or, say, Google, depending on what your services actually need to work around badly written software ("online" does not necessarily mean connection to Internet, nor, I presume, even to your default GW). But there's no need to "copy" *-wait-online to the user services, which is prone to break (and does not work for NM at all, it seems).

I haven't tested those, but they should work judging from the thumbs =).

I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue (which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).

I'm sorry for adding noise here myself, too =).

EDIT: My chosen workaround for the issue (cleanest in my opinion, less prone to break; I chose to name it check-network-online.service but it could be whatever you want it to be):

/etc/systemd/user/check-network-online.service:

[Unit]
Description=Check for system level network-online.target (for users)

[Service]
Type=oneshot
ExecStart=bash -c 'until systemctl is-active network-online.target; do sleep 1; done'
RemainAfterExit=yes

[Install]
WantedBy=default.target

Enable this service for the user. In badly behaving user services (such as podman quadlets), add:

After=check-network-online.service

Of course, YMMV!

Jul 19 '24 14:07 WildPenquin

I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue

I personally don't find it distracting.

(which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).

The thing is, pasta(1) picks host addresses and routes by default. This is by design as it allows you to avoid (implicit) NAT altogether. If there's nothing there, it doesn't know what to pick, so it exits.

We're now considering to implement an optional netlink monitoring function that would dynamically create and delete routes and addresses as they come and go on the host, see also https://github.com/containers/podman/issues/22959#issuecomment-2228900989. That should be robust enough.

Jul 25 '24 21:07 sbrivio-rh

@Luap99 @rhatdan @ygalblum shall we update the quadlet docs to point that out?

Sitting in a meeting where this issue was brought up.

Sep 20 '24 13:09 vrothberg

If the doc said "Quadlets are currently broken. Please see that bug report XXX we have with systemd.", at the top in red and bold, I guess the situation would be improved tremendously. Acknowledging current limits and bugs is a big part of establishing trust with users.

As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance.

Sep 20 '24 14:09 gdonval

Creating a unit for the user that runs until systemctl is-active network-online.target; do sleep 1; done does seem like a fairly simple and robust option that will be easy to see using normal systemctl commands and won't surprise anyone.

Could quadlet create such a unit automatically? That would be a big improvement to usability. It would also enable quadlet to adjust the implementation over time if systemd makes things easier. For example, maybe systemd will implement systemctl is-active --wait, which would be nice. Or maybe systemd will add a more direct way to solve this problem.

Options:

Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?

Opt-In: If the UX was better as opt-in for some reason, I'd suggest a new setting in the [Unit] section such as AfterNetwork=true or similar.

I think it would not be a good idea to automatically look for the user to write After=network-online.target and translate that to something compatible with user units. It's a bad UX that systemd essentially ignores that rather than complain loudly to the user that they've declared something invalid or are depending on a unit that doesn't exist. Quadlet should not "magically" fix an incorrect unit.

BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.

Sep 20 '24 14:09 mhrivnak

Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?

I like that proposal. Thanks for sharing, @mhrivnak !

Sep 20 '24 14:09 vrothberg

Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?

Quadlet already automatically adds After=network-online.target today.

The opt out is solved by using After= in the file which causes systemd to ignore all prior After= lines. You find that syntax described in the systemd docs.

Now of course network-online.target doesn't work rootless with current systemd versions so this is a NOP thus I see no problem changing that to our own functional network-online unit by default when running as user. Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case.

BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.

This is not really true and helpful either. I have never experimented this is on my systems because network setup is much faster I guess. So if we now decide to error out we just break users that do not hit this race because their network config was fast enough.

But yes in general this problem should be documented.

Sep 20 '24 15:09 Luap99

Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case

I'm not sure about that. Maybe I'm wrong here, but, don't you need a separate unit per user? Or is there a place to put units that all user units can point to?

Sep 20 '24 15:09 ygalblum

Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?

Quadlet already automatically adds After=network-online.target today.

The opt out is solved by using After= in the file which causes systemd to ignore all prior After= lines. You find that syntax described in the systemd docs.

Now of course network-online.target doesn't work rootless with current systemd versions so this is a NOP thus I see no problem changing that to our own functional network-online unit by default when running as user. Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case.

It needs to be installed for the specific user account, right? When would that happen if not at the time units are being generated and placed into the correct location for each user?

Maybe "generate" is the wrong word here, and it would just be a copy operation or even a symlink to some known location.

BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.

This is not really true and helpful either. I have never experimented this is on my systems because network setup is much faster I guess. So if we now decide to error out we just break users that do not hit this race because their network config was fast enough.

I think that most system admins, not to mention software engineers, would prefer to remove a race condition rather than depend on the presumption that they are likely to win the race most of the time.

That said, I see your point that if someone is currently winning the race every time, blissfully unaware that they're even competing in a race, it would not be a good experience to make their setup start failing. How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?

But yes in general this problem should be documented.

Sep 20 '24 15:09 mhrivnak

How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?

Or maybe quadlet could add a comment in the generated unit.

[Unit]
Description=some network service
# The below statement has no effect since this is a user unit. But quadlet has preserved it for reference.
# Please see https://github.com/containers/podman/issues/22197 for details and solutions
# for how to properly depend on network startup
After=network-online.target

Sep 20 '24 15:09 mhrivnak

Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case

I'm not sure about that. Maybe I'm wrong here, but, don't you need a separate unit per user? Or is there a place to put units that all user units can point to?

/usr/lib/systemd/user/ just like we already ship podman.service podman-auto-update.{timer,service},etc... This is a solved problem.

It needs to be installed for the specific user account, right? When would that happen if not at the time units are being generated and placed into the correct location for each user?

No see above, and the unit doesn't have to be enabled as long as quadlet adds Wants=our-new-unit it will triggered when your main unit will be started and does not need to be run when there are no quadlets at all.

I think that most system admins, not to mention software engineers, would prefer to remove a race condition rather than depend on the presumption that they are likely to win the race most of the time.

yes

That said, I see your point that if someone is currently winning the race every time, blissfully unaware that they're even competing in a race, it would not be a good experience to make their setup start failing. How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?

Yes this gets tricky, we could have a log message for sure but this is also kinda log spam as quadlet as generator is run on every daemon reload and once a user has acknowledged, worked around we would still warn all the time which gets annoying quickly.

If the pasta error message doesn't make sense we should aim to fix that message to make sense. Either pasta itself should print something better or podman can catch it and print something better instead... Because throwing warnings when we do not know if it will even fails is just not nice. But once we know pasta failed we can print whatever you think is reasonable.

Sep 20 '24 15:09 Luap99

@Luap99 great, thanks for the clarification.

So, it seems that in terms of functionality the way to go is to add this service to the installation and add a dependency on it when generating rootless units. The dependency can be removed using a certain key. This will be added for all Quadlet types.

I think the only question left open is regarding the logging.

Right?

Sep 20 '24 15:09 ygalblum

Shipping a unit directly with Podman sounds good to me, too 👍

Sep 23 '24 07:09 vrothberg

As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance.

I hope you'll reconsider once this issue is fixed. Feel free to reach out if you want to chat.

Sep 23 '24 07:09 vrothberg

@Luap99 @ygalblum can you confirm the proposed solution of shipping a systemd unit and adding that as a dependency for rootless Quadlets? I'd like to get this fixed soon to make sure it doesn't hit image mode.

Cc: @cgwalters @mrguitar @rhatdan

Sep 26 '24 09:09 vrothberg

podman podman copied to clipboard

Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

podman
podman copied to clipboard