podman
podman copied to clipboard
Podman fail to autostart containers through quadlet/systemd, works when launched manually, error with pasta
Issue Description
Hi,
Since the upgrade to Fedora Silverblue 40 / Podman 5, systemd fail to launch containers at boot.
If I try to launch them manually through systemctl --user start container.service
, it works as expected.
Thanks you!
Steps to reproduce the issue
Steps to reproduce the issue
- Automatize the gestion of container through quadlet /
~/.config/containers/systemd
files - Restart the server and see that containers failed to launch
Describe the results you received
Containers doesn't launch at boot, needs to be started manually
Describe the results you expected
Containers should start at boot.
podman info output
host:
arch: amd64
buildahVersion: 1.35.1
cgroupControllers:
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.8-4.fc40.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.8, commit: '
cpuUtilization:
idlePercent: 99.37
systemPercent: 0.21
userPercent: 0.42
cpus: 32
databaseBackend: sqlite
distribution:
distribution: fedora
variant: silverblue
version: "40"
eventLogger: journald
freeLocks: 2047
hostname: homeserver
idMappings:
gidmap:
- container_id: 0
host_id: 1020
size: 1
- container_id: 1
host_id: 1703936
size: 65536
uidmap:
- container_id: 0
host_id: 1020
size: 1
- container_id: 1
host_id: 1703936
size: 65536
kernel: 6.8.1-300.fc40.x86_64
linkmode: dynamic
logDriver: journald
memFree: 64334761984
memTotal: 67334115328
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.10.0-1.fc40.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.10.0
package: netavark-1.10.3-3.fc40.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.10.3
ociRuntime:
name: crun
package: crun-1.14.4-1.fc40.x86_64
path: /usr/bin/crun
version: |-
crun version 1.14.4
commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
rundir: /run/user/1020/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20240320.g71dd405-1.fc40.x86_64
version: |
pasta 0^20240320.g71dd405-1.fc40.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: false
path: /run/user/1020/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 146028879872
swapTotal: 146028879872
uptime: 0h 14m 2.00s
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/srv/media-server/.config/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /srv/media-server/.local/share/containers/storage
graphRootAllocated: 3999065440256
graphRootUsed: 1034920087552
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 14
runRoot: /run/user/1020/containers
transientStore: false
volumePath: /var/srv/media-server/.local/share/containers/storage/volumes
version:
APIVersion: 5.0.0
Built: 1710806400
BuiltTime: Tue Mar 19 01:00:00 2024
GitCommit: ""
GoVersion: go1.22.0
Os: linux
OsArch: linux/amd64
Version: 5.0.0
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Fedora Silverblue 40 up-to-date
Additional information
Logs of a container :
mars 28 12:15:09 homeserver jellyfin[7039]: Error: pasta failed with exit code 1: mars 28 12:15:09 homeserver jellyfin[7039]: External interface not usable
You have to make sure your network is fully set up before the unit is started.
This feel like it could be related to the same question in https://github.com/containers/podman/pull/22057
I have not been able to get a rootless user quadlet to wait for my network to be ready even adding
[Unit]
wants=nss-online.target
after=nss-online.target
No issues on 4.9.3
@flyingfishflash You cannot wait for system units from user units, see https://github.com/systemd/systemd/issues/3312
I wasn't aware that the user units start before the network is fully set up and that it causes such big trouble with pasta. Note you do not need to downgrade, you can just change the default back to slirp4netns in containers.conf, see the last part in the pasta section on https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/
You could also do something like this https://github.com/containers/podman/issues/22190#issuecomment-2027257771
Of course none of this is a proper solution but I am sure we will find something to address this in a better way soon.
@Luap99 - thank you for this tip re containers.conf!
You could also do something like this #22190 (comment)
No. It's as much of a bad practice today as it was 50 years ago.
I ran into this issue today and finally learned that systemd user level units apparently can't depend on system level units (such as network-online.target
)
I've managed a workaround that satisfies my desire to avoid arbitrary timeouts by creating a user-level network-online.service
and network-online.target
# ~/.config/systemd/user/network-online.service
[Unit]
Description=User-level proxy to system-level network-online.target
[Service]
type=oneshot
ExecStart=/bin/bash -c 'until systemctl --machine=%[email protected] is-active network-online.target; do sleep 1; done'
[Install]
WantedBy=default.target
# ~/.config/systemd/user/network-online.target
[Unit]
Description=User-level network-online.target
Requires=network-online.service
Wants=network-online.service
After=network-online.service
Then in your quadlet units:
[Unit]
After=network-online.target
seems it just work after you can ping an external ip (include gateway ip)
I'll share my workaround, but it might be a good idea to have a podman network --health
command to verify by driver and network and such.
#[Unit]
Description=Wait for network to be online via NetworkManager or Systemd-Networkd
[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.
Type=oneshot
# At least one of these should work depending if using NetworkManager or Systemd-Networkd
ExecStart=/bin/bash -c ' \
if command -v nm-online &>/dev/null; then \
nm-online -s -q; \
elif command -v /usr/lib/systemd/systemd-networkd-wait-online &>/dev/null; then \
/usr/lib/systemd/systemd-networkd-wait-online; \
else \
echo "Error: Neither nm-online nor systemd-networkd-wait-online found."; \
exit 1; \
fi'
ExecStartPost=ip -br addr
RemainAfterExit=yes
# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit <THIS SERVICE NAME>`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60
[Install]
WantedBy=default.target
Another workaround:
We can copy network-online.target
from system to user, with a little modify, like this:
$ cat /etc/systemd/user/network-online.target
[Unit]
Description=Network online for systemd --user
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
#After=network.target
$ cat /etc/systemd/user/systemd-networkd-wait-online.service
[Unit]
Description=Wait network online for systemd --user
Documentation=man:systemd-networkd-wait-online.service(8)
Before=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online
RemainAfterExit=yes
[Install]
WantedBy=network-online.target
or you can put these files to ~/.config/systemd/user
for only one user.
Then enable the service as a user:
$ systemctl --user enable systemd-networkd-wait-online.service
Finally we can wait network online for podman, like this:
$ cat ~/.config/containers/systemd/my-app.container
[Unit]
Wants=network-online.target
After=network-online.target
reference link: https://unix.stackexchange.com/questions/216919/how-can-i-make-my-user-services-wait-till-the-network-is-online
Hi,
Any idea for a workaround when using NetworkManager?
I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the -s
, but the behavior is still the same.
dog /etc/systemd/user/network-online.target:
# SPDX-License-Identifier: LGPL-2.1-or-later
#
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
[Unit]
Description=Network is Online
Documentation=man:systemd.special(7)
Documentation=https://systemd.io/NETWORK_ONLINE
# After=network.target
/etc/systemd/user/NetworkManager-wait-online.service:
[Unit]
Description=Network Manager Wait Online for Users
Documentation=man:NetworkManager-wait-online.service(8)
Requires=NetworkManager.service
After=NetworkManager.service
Before=network-online.target
[Service]
# `nm-online -s` waits until the point when NetworkManager logs
# "startup complete". That is when startup actions are settled and
# devices and profiles reached a conclusive activated or deactivated
# state. It depends on which profiles are configured to autoconnect and
# also depends on profile settings like ipv4.may-fail/ipv6.may-fail,
# which affect when a profile is considered fully activated.
# Check NetworkManager logs to find out why wait-online takes a certain
# time.
Type=oneshot
ExecStart=/usr/bin/nm-online -q
RemainAfterExit=yes
# Set $NM_ONLINE_TIMEOUT variable for timeout in seconds.
# Edit with `systemctl edit NetworkManager-wait-online`.
#
# Note, this timeout should commonly not be reached. If your boot
# gets delayed too long, then the solution is usually not to decrease
# the timeout, but to fix your setup so that the connected state
# gets reached earlier.
Environment=NM_ONLINE_TIMEOUT=60
[Install]
WantedBy=network-online.target
journalctl -b0 | grep Online
:
Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online...
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online.
Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.
The above is the system log, 12:43:09 is the user service. As the user running the podman container, LANG=C journalctl --user -b0 | grep Online
:
Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:
systemctl --user status NetworkManager-wait-online.service
○ NetworkManager-wait-online.service - Network Manager Wait Online for Users
Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled)
Active: inactive (dead)
Docs: man:NetworkManager-wait-online.service(8)
As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround:
ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'
I haven't used /etc/systemd/user
, but my unit works, at least I haven't noticed an issue, when placed in ~/.config/Systemd/user
.
Hi,
Any idea for a workaround when using NetworkManager?
I tried to adapt @secext2022 's workaround, but the user service still "thinks" the Network is online approx. 7 seconds too early. I tried to change the parameter for nm-online by removing the
-s
, but the behavior is still the same.dog /etc/systemd/user/network-online.target:
# SPDX-License-Identifier: LGPL-2.1-or-later # # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. [Unit] Description=Network is Online Documentation=man:systemd.special(7) Documentation=https://systemd.io/NETWORK_ONLINE # After=network.target
/etc/systemd/user/NetworkManager-wait-online.service:
[Unit] Description=Network Manager Wait Online for Users Documentation=man:NetworkManager-wait-online.service(8) Requires=NetworkManager.service After=NetworkManager.service Before=network-online.target [Service] # `nm-online -s` waits until the point when NetworkManager logs # "startup complete". That is when startup actions are settled and # devices and profiles reached a conclusive activated or deactivated # state. It depends on which profiles are configured to autoconnect and # also depends on profile settings like ipv4.may-fail/ipv6.may-fail, # which affect when a profile is considered fully activated. # Check NetworkManager logs to find out why wait-online takes a certain # time. Type=oneshot ExecStart=/usr/bin/nm-online -q RemainAfterExit=yes # Set $NM_ONLINE_TIMEOUT variable for timeout in seconds. # Edit with `systemctl edit NetworkManager-wait-online`. # # Note, this timeout should commonly not be reached. If your boot # gets delayed too long, then the solution is usually not to decrease # the timeout, but to fix your setup so that the connected state # gets reached earlier. Environment=NM_ONLINE_TIMEOUT=60 [Install] WantedBy=network-online.target
journalctl -b0 | grep Online
:Jul 17 12:43:09 archnuke systemd[1]: Starting Network Manager Wait Online... Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online. Jul 17 12:43:16 archnuke systemd[1]: Finished Network Manager Wait Online. Jul 17 12:43:16 archnuke systemd[1]: Reached target Network is Online.
The above is the system log, 12:43:09 is the user service. As the user running the podman container,
LANG=C journalctl --user -b0 | grep Online
:Jul 17 12:43:09 archnuke systemd[706]: Reached target Network is Online.
Not sure why the NetworkManager-wait-online is not in the user log, it is enabled for the user:
systemctl --user status NetworkManager-wait-online.service ○ NetworkManager-wait-online.service - Network Manager Wait Online for Users Loaded: loaded (/etc/xdg/systemd/user/NetworkManager-wait-online.service; enabled; preset: enabled) Active: inactive (dead) Docs: man:NetworkManager-wait-online.service(8)
As another workaround, I'm thinking for now adding to the Quadlet another dirty workaround:
ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'
@WildPenquin
Please check this in the container service:
[Unit]
Wants=network-online.target
After=network-online.target
$ systemctl --user status my-app.service
● my-app.service - example deno/fresh app
Loaded: loaded (/var/home/fc-test/.config/containers/systemd/my-app.container; generated)
Drop-In: /usr/lib/systemd/user/service.d
└─10-timeout-abort.conf
Active: active (running) since Wed 2024-07-17 04:21:49 UTC; 20h ago
Main PID: 2026 (conmon)
$ systemctl --user list-dependencies my-app
my-app.service
● ├─app.slice
● ├─basic.target
● │ ├─systemd-tmpfiles-setup.service
● │ ├─paths.target
● │ ├─sockets.target
● │ │ └─dbus.socket
● │ └─timers.target
● │ └─systemd-tmpfiles-clean.timer
● └─network-online.target
● └─systemd-networkd-wait-online.service
Hi @secext2022 ,
The Unit section is defined correctly.
As per my log, the problem is that NetoworkManager-wait-online user service finishes much too soon, much sooner that the system level one. I believe (meaning I'm not sure) that nm-online does not work correctly when run as a user (not designed to be run as a user?).
As yet another workaround, I've added ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;'
under [Service]
. On the TODO list, I'm going to test if this works correctly if I change my interface to be managed by systemd-networkd with and use the systemd-networkd-wait-online service instead.
$ systemctl --user status pande-pmc.service
● pande-pmc.service - PandESportS MC-serveri
Loaded: loaded (/home/minecraft/.config/containers/systemd/pande-pmc.container; generated)
Active: active (running) since Fri 2024-07-19 16:04:56 EEST; 4min 55s ago
Invocation: 9858022ff77a4dd38327d8c513324e7d
Process: 829 ExecStartPre=/bin/sh -c until ping -c1 192.168.66.6; do sleep 1; done; (code=exited, status=0/SUCCESS)
Main PID: 906 (conmon)
Tasks: 82 (limit: 28525)
Memory: 6.2G (peak: 6.2G)
CPU: 1min 6.141s
$ systemctl --user list-dependencies pande-pmc.service
pande-pmc.service
● ├─app.slice
● ├─basic.target
● │ ├─paths.target
● │ ├─sockets.target
● │ │ ├─dbus.socket
● │ │ ├─dirmngr.socket
● │ │ ├─drkonqi-coredump-launcher.socket
● │ │ ├─gpg-agent-browser.socket
● │ │ ├─gpg-agent-extra.socket
● │ │ ├─gpg-agent-ssh.socket
● │ │ ├─gpg-agent.socket
● │ │ ├─keyboxd.socket
● │ │ ├─p11-kit-server.socket
● │ │ ├─pipewire-pulse.socket
● │ │ └─pipewire.socket
● │ └─timers.target
○ │ ├─drkonqi-coredump-cleanup.timer
○ │ └─drkonqi-sentry-postman.timer
● └─network-online.target
○ └─NetworkManager-wait-online.service
config/containers/systemd/pande-pmc.container
:
[Unit]
Description=PandESportS MC-serveri
After=network-online.target
Wants=network-online.target
[Container]
AutoUpdate=registry
ContainerName=PandEPMC
Image=docker.io/gameservermanagers/gameserver:pmc
Volume=pandepmc:/data
LogDriver=k8s-file
PublishPort=25560:25560/tcp
PublishPort=25560:25560/udp
PodmanArgs=--log-opt=path=/home/minecraft/PandEPMClog.k8s
Timezone=local
[Service]
ExecStartPre=/bin/sh -c 'until ping -c1 192.168.66.6; do sleep 1; done;'
# Restart=always
Restart=no
[Install]
WantedBy=multi-user.target default.target
After reading this thread and also the comments in https://github.com/systemd/systemd/issues/3312 , I think that thread has much cleaner workarounds than many of the ones in this thread. The problems with the workaround in here are that they are often quite long and convoluted for this relatively simple issue, and may or will break if the system configuration changes, as they are not agnostic on the configuration. But the systemd issue has much cleaner and simpler workarounds:
- Make the whole user@UID service depend on network-online (https://github.com/systemd/systemd/issues/3312#issuecomment-2039973852) - but read the whole comment for caveats! This will work if you have a dedicated user for running containers which are useless without a network (so the caveats don't matter). Rename the [email protected] to include the UID to not enable this for all users. 3 lines, changing user@ service.
- Make one user service which checks the system level network-online.target (https://github.com/systemd/systemd/issues/3312#issuecomment-2185399471). Then make quadlets depends on this service. This is ~~4 lines of code~~ one simple service file which should work as long as system network-online.target is configured properly. You could replace the
systemctl is-active
with a ping to your GW or, say, Google, depending on what your services actually need to work around badly written software ("online" does not necessarily mean connection to Internet, nor, I presume, even to your default GW). But there's no need to "copy" *-wait-online to the user services, which is prone to break (and does not work for NM at all, it seems).
I haven't tested those, but they should work judging from the thumbs =).
I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue (which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).
I'm sorry for adding noise here myself, too =).
EDIT: My chosen workaround for the issue (cleanest in my opinion, less prone to break; I chose to name it check-network-online.service but it could be whatever you want it to be):
/etc/systemd/user/check-network-online.service:
[Unit]
Description=Check for system level network-online.target (for users)
[Service]
Type=oneshot
ExecStart=bash -c 'until systemctl is-active network-online.target; do sleep 1; done'
RemainAfterExit=yes
[Install]
WantedBy=default.target
Enable this service for the user. In badly behaving user services (such as podman quadlets), add:
After=check-network-online.service
Of course, YMMV!
I'm also starting to think maybe we should not be discussing workarounds here that much since it adds noise to actually solving the issue
I personally don't find it distracting.
(which is: podman user containers should not fail at boot if networking is up). (As a general remark, no services should fail for whatever network error, but instead handle the situation, as network connections are unreliable. All these workaround should be unnecessary!).
The thing is, pasta(1) picks host addresses and routes by default. This is by design as it allows you to avoid (implicit) NAT altogether. If there's nothing there, it doesn't know what to pick, so it exits.
We're now considering to implement an optional netlink monitoring function that would dynamically create and delete routes and addresses as they come and go on the host, see also https://github.com/containers/podman/issues/22959#issuecomment-2228900989. That should be robust enough.
@Luap99 @rhatdan @ygalblum shall we update the quadlet docs to point that out?
Sitting in a meeting where this issue was brought up.
If the doc said "Quadlets are currently broken. Please see that bug report XXX we have with systemd.", at the top in red and bold, I guess the situation would be improved tremendously. Acknowledging current limits and bugs is a big part of establishing trust with users.
As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance.
Creating a unit for the user that runs until systemctl is-active network-online.target; do sleep 1; done
does seem like a fairly simple and robust option that will be easy to see using normal systemctl
commands and won't surprise anyone.
Could quadlet create such a unit automatically? That would be a big improvement to usability. It would also enable quadlet to adjust the implementation over time if systemd makes things easier. For example, maybe systemd will implement systemctl is-active --wait
, which would be nice. Or maybe systemd will add a more direct way to solve this problem.
Options:
Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After=
line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?
Opt-In: If the UX was better as opt-in for some reason, I'd suggest a new setting in the [Unit]
section such as AfterNetwork=true
or similar.
I think it would not be a good idea to automatically look for the user to write After=network-online.target
and translate that to something compatible with user units. It's a bad UX that systemd essentially ignores that rather than complain loudly to the user that they've declared something invalid or are depending on a unit that doesn't exist. Quadlet should not "magically" fix an incorrect unit.
BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target
in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.
Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?
I like that proposal. Thanks for sharing, @mhrivnak !
Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?
Quadlet already automatically adds After=network-online.target
today.
The opt out is solved by using After=
in the file which causes systemd to ignore all prior After= lines. You find that syntax described in the systemd docs.
Now of course network-online.target
doesn't work rootless with current systemd versions so this is a NOP thus I see no problem changing that to our own functional network-online unit by default when running as user. Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case.
BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.
This is not really true and helpful either. I have never experimented this is on my systems because network setup is much faster I guess. So if we now decide to error out we just break users that do not hit this race because their network config was fast enough.
But yes in general this problem should be documented.
Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case
I'm not sure about that. Maybe I'm wrong here, but, don't you need a separate unit per user? Or is there a place to put units that all user units can point to?
Automatic and Opt-Out: most containers that run as a service need a network, right? If quadlet generated a unit like this by default for every user container unit and added the After= line to the generated container unit, that would solve this problem for everyone. Perhaps some edge cases would need a way to opt-out?
Quadlet already automatically adds
After=network-online.target
today.The opt out is solved by using
After=
in the file which causes systemd to ignore all prior After= lines. You find that syntax described in the systemd docs.Now of course
network-online.target
doesn't work rootless with current systemd versions so this is a NOP thus I see no problem changing that to our own functional network-online unit by default when running as user. Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case.
It needs to be installed for the specific user account, right? When would that happen if not at the time units are being generated and placed into the correct location for each user?
Maybe "generate" is the wrong word here, and it would just be a copy operation or even a symlink to some known location.
BUT I think it is worth considering that quadlet could help the user by noticing that they wrote After=network-online.target in a user unit and failing with a helpful error message that shows them how to do it correctly. That would be incredibly helpful compared to the error message people see from pasta today.
This is not really true and helpful either. I have never experimented this is on my systems because network setup is much faster I guess. So if we now decide to error out we just break users that do not hit this race because their network config was fast enough.
I think that most system admins, not to mention software engineers, would prefer to remove a race condition rather than depend on the presumption that they are likely to win the race most of the time.
That said, I see your point that if someone is currently winning the race every time, blissfully unaware that they're even competing in a race, it would not be a good experience to make their setup start failing. How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?
But yes in general this problem should be documented.
How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?
Or maybe quadlet could add a comment in the generated unit.
[Unit]
Description=some network service
# The below statement has no effect since this is a user unit. But quadlet has preserved it for reference.
# Please see https://github.com/containers/podman/issues/22197 for details and solutions
# for how to properly depend on network startup
After=network-online.target
Letting quadlet generate one doesn't seem to useful to me. We can ship the static unit file in the rpm as I don't think there is anything dynamic needed for that. All we need to to in quadlet is change the name in the user case
I'm not sure about that. Maybe I'm wrong here, but, don't you need a separate unit per user? Or is there a place to put units that all user units can point to?
/usr/lib/systemd/user/ just like we already ship podman.service podman-auto-update.{timer,service},etc... This is a solved problem.
It needs to be installed for the specific user account, right? When would that happen if not at the time units are being generated and placed into the correct location for each user?
No see above, and the unit doesn't have to be enabled as long as quadlet adds Wants=our-new-unit it will triggered when your main unit will be started and does not need to be run when there are no quadlets at all.
I think that most system admins, not to mention software engineers, would prefer to remove a race condition rather than depend on the presumption that they are likely to win the race most of the time.
yes
That said, I see your point that if someone is currently winning the race every time, blissfully unaware that they're even competing in a race, it would not be a good experience to make their setup start failing. How about a loud log message at least so that if they ever do lose the race, or happen to look at the logs, they'll have an easier time understanding what happened, rather than have to google a weird error message from pasta?
Yes this gets tricky, we could have a log message for sure but this is also kinda log spam as quadlet as generator is run on every daemon reload and once a user has acknowledged, worked around we would still warn all the time which gets annoying quickly.
If the pasta error message doesn't make sense we should aim to fix that message to make sense. Either pasta itself should print something better or podman can catch it and print something better instead... Because throwing warnings when we do not know if it will even fails is just not nice. But once we know pasta failed we can print whatever you think is reasonable.
@Luap99 great, thanks for the clarification.
So, it seems that in terms of functionality the way to go is to add this service to the installation and add a dependency on it when generating rootless units. The dependency can be removed using a certain key. This will be added for all Quadlet types.
I think the only question left open is regarding the logging.
Right?
Shipping a unit directly with Podman sounds good to me, too 👍
As it is, users stumble across this again and again. I can't speak for the general industry but here, no one wants to hear about podman again for instance.
I hope you'll reconsider once this issue is fixed. Feel free to reach out if you want to chat.
@Luap99 @ygalblum can you confirm the proposed solution of shipping a systemd unit and adding that as a dependency for rootless Quadlets? I'd like to get this fixed soon to make sure it doesn't hit image mode.
Cc: @cgwalters @mrguitar @rhatdan