Switch systemd unit to TasksMax
I'm hitting the same problem as outlined in https://github.com/caddyserver/caddy/issues/1802. The culprit seems to be how systemd handles the LimitNProc option:
https://github.com/caddyserver/dist/blob/49a805b0196e8c9e394cfe3546f2cd568d6e37d1/init/caddy.service#L30
While caddy doesn't occupy that many processes, some other docker containers seem to use the same UID for their processes:
sudo ps -U caddy
PID TTY TIME CMD
4491 ? 00:00:01 mailrise
36706 ? 00:00:28 postgres
36760 ? 00:00:01 postgres
36761 ? 00:00:06 postgres
36762 ? 00:00:10 postgres
36763 ? 00:03:55 postgres
36764 ? 00:00:14 postgres
36765 ? 00:01:17 postgres
36766 ? 00:00:00 postgres
1597030 ? 00:00:03 postgres
1599669 ? 00:00:03 postgres
2081581 ? 00:25:43 redis-server
2082548 ? 00:00:36 postgres
2082623 ? 00:00:34 postgres
2654461 ? 00:00:00 start.sh
2654495 ? 00:00:00 Xvfb
2654496 ? 00:00:00 dumb-init
2654497 ? 00:48:58 node
2654671 ? 00:01:16 chrome
2654672 ? 00:01:16 chrome
2654673 ? 00:01:14 chrome
2654674 ? 00:01:14 chrome
2654675 ? 00:01:16 chrome
2654676 ? 00:01:13 chrome
2654677 ? 00:01:15 chrome
2654678 ? 00:01:14 chrome
2654683 ? 00:00:00 chrome_crashpad
2654684 ? 00:00:00 chrome_crashpad
2654685 ? 00:00:00 chrome_crashpad
2654686 ? 00:00:00 chrome_crashpad
2654691 ? 00:00:00 chrome_crashpad
2654692 ? 00:00:00 chrome_crashpad
2654693 ? 00:00:00 chrome_crashpad
2654694 ? 00:00:00 chrome_crashpad
2654703 ? 00:00:00 chrome
2654704 ? 00:00:00 chrome
2654705 ? 00:00:00 chrome
2654706 ? 00:00:00 chrome
2654707 ? 00:00:00 chrome
2654708 ? 00:00:00 chrome
2654709 ? 00:00:00 chrome
2654710 ? 00:00:00 chrome
2654711 ? 00:01:14 chrome
2654712 ? 00:01:13 chrome
2654715 ? 00:00:00 chrome_crashpad
2654717 ? 00:00:00 chrome_crashpad
2654718 ? 00:00:00 chrome_crashpad
2654722 ? 00:00:00 chrome_crashpad
2654723 ? 00:00:00 chrome
2654724 ? 00:00:00 chrome
2654727 ? 00:00:00 chrome
2654728 ? 00:00:00 chrome
2654729 ? 00:00:00 nacl_helper
2654730 ? 00:00:00 nacl_helper
2654732 ? 00:00:00 chrome_crashpad
2654750 ? 00:00:00 chrome_crashpad
2654752 ? 00:00:00 chrome_crashpad
2654753 ? 00:00:00 nacl_helper
2654757 ? 00:00:00 nacl_helper
2654759 ? 00:00:00 chrome_crashpad
2654761 ? 00:00:00 chrome
2654762 ? 00:00:00 chrome
2654767 ? 00:00:00 chrome_crashpad
2654768 ? 00:00:00 chrome_crashpad
2654770 ? 00:00:00 nacl_helper
2654781 ? 00:00:00 chrome
2654786 ? 00:00:00 chrome
2654796 ? 00:00:00 chrome_crashpad
2654800 ? 00:00:00 chrome
2654802 ? 00:00:00 chrome
2654816 ? 00:00:00 chrome_crashpad
2654817 ? 00:00:16 chrome
2654818 ? 00:00:17 chrome
2654821 ? 00:00:00 chrome
2654822 ? 00:00:00 chrome
2654823 ? 00:00:17 chrome
2654824 ? 00:00:17 chrome
2654828 ? 00:00:00 nacl_helper
2654881 ? 00:00:17 chrome
2654884 ? 00:00:00 nacl_helper
2654885 ? 00:00:16 chrome
2654886 ? 00:00:17 chrome
2654901 ? 00:00:00 nacl_helper
2654907 ? 00:00:17 chrome
2654910 ? 00:00:17 chrome
2654916 ? 00:00:00 nacl_helper
2654922 ? 00:00:17 chrome
2654985 ? 00:00:19 chrome
2654999 ? 00:00:00 nacl_helper
2655029 ? 00:00:05 chrome
2655048 ? 00:00:17 chrome
2655053 ? 00:00:05 chrome
2655063 ? 00:00:16 chrome
2655065 ? 00:00:17 chrome
2655066 ? 00:00:17 chrome
2655079 ? 00:00:17 chrome
2655080 ? 00:00:16 chrome
2655085 ? 00:00:05 chrome
2655089 ? 00:00:05 chrome
2655092 ? 00:00:17 chrome
2655096 ? 00:00:05 chrome
2655097 ? 00:00:05 chrome
2655105 ? 00:00:05 chrome
2655129 ? 00:00:05 chrome
2655136 ? 00:00:05 chrome
2655179 ? 00:00:05 chrome
2655180 ? 00:00:20 chrome
2655186 ? 00:00:17 chrome
2655199 ? 00:00:05 chrome
2655223 ? 00:00:05 chrome
2655315 ? 00:00:05 chrome
2655323 ? 00:00:05 chrome
2655330 ? 00:00:05 chrome
2655337 ? 00:00:05 chrome
2655341 ? 00:00:05 chrome
2655346 ? 00:00:05 chrome
2655385 ? 00:00:05 chrome
2655391 ? 00:00:05 chrome
The systemd documentation notes that TasksMax should be preferred over LimitNProc:
Note that
LimitNPROC=will limit the number of processes from one (real) UID and not the number of processes started (forked) by the service. Therefore the limit is cumulative for all processes running under the same UID. Please also note that theLimitNPROC=will not be enforced if the service is running as root (and not dropping privileges). Due to these limitations,TasksMax=(see systemd.resource-control(5)) is typically a better choice thanLimitNPROC=.
https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Process%20Properties
The limit got raised to a generous value of 512 in https://github.com/caddyserver/caddy/pull/1825 in order to solve https://github.com/caddyserver/caddy/issues/1723. But it's still possible to hit the limit due to misattribution of other container processes with the same UID.
The TasksMax option solves this by only limiting the number of processes started as part of the service, which is what we actually want to achieve.
I'm confused. Why would you be running Docker using the caddy user?
Anyway this seems to make sense on paper, but I'd like @carlwgeorge to review as well.
The problem is that the numeric UID of the caddy user happens to overlap with one or more users inside Docker containers.
On my host, the caddy user has the UID 999. If the user in a container happens to have the same UID, systemd would attribute those processes to the caddy user and conclude that the limit has been reached.
The limit seems to include threads as explained in setrlimit(2):
The maximum number of processes (or, more precisely on Linux, threads) that can be created for the real user ID of the calling process. Upon encountering this limit, fork(2) fails with the error EAGAIN.
I used the following bash script to determine the current value:
sudo ps -U caddy -h -o nlwp | awk '{total += $1} END {print total}'
This yields 749 on my system with all services and containers running. LimitNPROC=800 works, but the unit fails to start once I switch to LimitNPROC=700.
The task limit is much more reliable and better reflects reality:
● caddy.service - Caddy
Loaded: loaded (/etc/systemd/system/caddy.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-12-28 20:45:05 CET; 8min ago
Docs: https://caddyserver.com/docs/
Main PID: 1885185 (caddy)
Tasks: 8 (limit: 512)
Memory: 12.7M
CPU: 413ms
CGroup: /system.slice/caddy.service
└─1885185 /usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
1 parent process + 7 children -> 8 tasks
The main offender in my case is the browserless/chrome container.
docker top playwright-chrome o user,uid,pid
USER UID PID
caddy 999 2654461
caddy 999 2654495
caddy 999 2654496
caddy 999 2654497
caddy 999 2654671
caddy 999 2654672
caddy 999 2654673
caddy 999 2654674
caddy 999 2654675
caddy 999 2654676
caddy 999 2654677
caddy 999 2654678
caddy 999 2654683
caddy 999 2654684
caddy 999 2654685
caddy 999 2654686
caddy 999 2654691
caddy 999 2654692
caddy 999 2654693
caddy 999 2654694
caddy 999 2654703
caddy 999 2654704
caddy 999 2654705
caddy 999 2654706
caddy 999 2654707
caddy 999 2654708
caddy 999 2654709
caddy 999 2654710
caddy 999 2654711
caddy 999 2654712
caddy 999 2654715
caddy 999 2654717
caddy 999 2654718
The container alone is enough to trip the limitation:
docker top playwright-chrome -o nlwp,pid | tail -n +2 | awk '{total += $1} END {print total}'
yields 717.
@francislavoie the following systemd documentation PR describes the situation very good: https://github.com/systemd/systemd/pull/23242
@francislavoie Sorry for my delay in getting back to you on this. Adoption of this option should wait until the project is ready to abandon building for RHEL 7. TasksMax was added in systemd 227, but RHEL 7 only has systemd 219. RHEL 8 bumps up to systemd 239. RHEL 7 goes EOL on 2024-06-30, so that may the ideal time to switch to TasksMax.
Thanks @carlwgeorge, glad I waited.
Would it be okay if we dropped RHEL 7 support early then? It just would mean it wouldn't receive this one last release before official EOL I guess.
Does COPR have recent download stats that would give us an idea how much it's still being used?
Note that users would still able to work around it using
systemctl edit caddy.
Yeah, understood. I just rather not block merging this for everyone else who would benefit, while waiting for one old distribution to cycle out.
Would it be okay if we dropped RHEL 7 support early then? It just would mean it wouldn't receive this one last release before official EOL I guess.
I don't have a problem with the project dropping support for RHEL 7 early, I just would like to be an explicit decision, not a "whoops". Doing it intentionally would also be less disruptive for RHEL 7 caddy users, as we wouldn't ship an update to them with an incompatible option. Ideally we would inform them with some kind of announcement that there will be no more RHEL 7 caddy updates.
Does COPR have recent download stats that would give us an idea how much it's still being used?
RHEL 7 and it's derivatives are still pretty popular, more so than they should be this late into their lifecycles. Here are the download stats from COPR.
- EPEL 7: 16,491
- EPEL 8: 11,145
- EPEL 9: 26,813
Note that users would still able to work around it using
systemctl edit caddy.
Just like people that want to start using TasksMax now can, before the project makes it the default. With the broad range of systemd versions in the wild, it makes more sense to keep the default unit using directives that are available on all distributions that the project targets with the apt and rpm repos.
We could also simply drop LimitNProc to be honest.
EPEL 7: 16,491
Oof, yeah that's not as low as I'd hoped.
We could also simply drop
LimitNProcto be honest.
Yeah, I'd be okay with that too in the short term.
I don't really think we need to worry about Go running wild, it's a well behaving runtime.
Small update, RHEL 7 is now EOL, and we already skipped building caddy 2.8 for it a month before the EOL date. We could add TasksMax to the unit file now if people still think it's needed. I don't have a strong opinion on whether we need it or not, I just wanted to mention that it's a viable option now as it's available on all the distro versions we're targeting in copr.