authentik root: fix 100% CPU for worker container (#7025)

Some linux users (arch linux, for example) are running docker with default service file that set NOFILES to infiite, which will cause celery to hang for hours to days taking 100% CPU to close all fds by enumerating from NOFILES to 3.

This commit override ulimit for container without touching user docker service configuration.

For details see #7025

Dec 03 '23 17:12 DKingAlpha

Deploy Preview for authentik-storybook failed.

Name	Link
Latest commit	3b0a1ac931d51dffd3b864bd8697a45cde2f6fcd
Latest deploy log	https://app.netlify.com/sites/authentik-storybook/deploys/656cc0d92b027400084688ae

Dec 03 '23 17:12 netlify[bot]

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 92.64%. Comparing base (2bc4506) to head (3b0a1ac). :warning: Report is 6102 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7762      +/-   ##
==========================================
+ Coverage   92.62%   92.64%   +0.02%     
==========================================
  Files         588      588              
  Lines       29141    29141              
==========================================
+ Hits        26991    26997       +6     
+ Misses       2150     2144       -6

Flag	Coverage Δ
e2e	`50.72% <ø> (+0.02%)`	:arrow_up:
integration	`25.94% <ø> (ø)`
unit	`89.71% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Dec 03 '23 18:12 codecov[bot]

Tried to ping someone, waiting for feedback.

Meanwhile you can easily reproduce by changing the ulimit to 0x3ffffff8 in decimal. That should kinda prove it.

Dec 03 '23 18:12 DKingAlpha

For reference:

Here's what I have at home:

$ ulimit -Sn
1048576
# ulimit -Hn
1048576

And we have the same in production at authentik

Dec 04 '23 04:12 rissson

Fixes the issue for me.

OS: clear-linux
# ulimit -Sn
1024
# ulimit -Hn
524288

Any idea someone how I can change this for my Authentik worker on unraid? It’s been running high cpu usage for days now? Help is appreciated.

Reporting back (unraid-solved): In hind side I did 3 things, not sure what solved it. 1) in the Unraid template I added "-ulimit nofile=10240:10240" in Extra Parameters field as flag (advanced view) 2) redeployed (removing containers and images) both worker and authentik. 3) added AUTHENTIK_REDIS__DB:1 as variable to the unraid template for both Worker and authentik. Now everything seems normal.

Jan 03 '24 07:01 mobiledude

Do you know why setting ulimit to a larger number fixes the issue? Is it an issue in Celery or Authentik?

Jan 03 '24 22:01 cenkalti

With #7810, #8440 and #7813 this shouldn't be an issue anymore, could you check this again with 2024.2.2 @cenkalti @mobiledude @Leptopoda @DKingAlpha

Mar 15 '24 16:03 BeryJu

Thanks for coming back. I can confirm that I no longer need the workaround on 2024.02.2

Mar 15 '24 23:03 Leptopoda

I still got high cpu usage with latest 2024.2.2. Adding ulimit back to compose.yml fixed the issue for me.

live py profiler py-spy is incompatible with recent py3.12, I will find another way to identify the issue when I have time.

Mar 18 '24 17:03 DKingAlpha

I still experience high CPU usage with the latest 2024.2.2 version. However, I was able to resolve the issue by adding ulimit back to the 'compose.yml' file.

Mar 18 '24 18:03 cenkalti

I can confirm that it fixed my setup too so its imho worth merging :tada:

Mar 20 '24 10:03 MyIgel

New authentik user here. Tried re-setting Redis, tried setting ulimits in docker-compose, unfortunately CPU still spikes at 100%. After some more troubleshooting I did increase the RAM allocation to the VM (while still leaving custom ulimits in pace), and suddenly it all started to work - no CPU spikes

Mar 26 '24 12:03 SpiderD555

Same here with 2024.4.2, ulimits in the compose fixed the issue.

May 13 '24 00:05 Janhouse

Adding ulimits back to compose fixed my issue on 2024.4.2.

Jun 01 '24 06:06 arthurlockman

For context, the reason why we haven't merged this PR:

Configuring ulimit values for containers is possible with compose, but not possible with kubernetes, so it would only solve half the problem
It's also more of a bandaid than a full solution, this ulimit adjustment shouldn't be required and changing the values seems to just prevent a bug in either our code/our usage of celery, or celery itself from happening, instead of fixing the root cause itself.

Jun 01 '24 08:06 BeryJu

I can confirm this fixed my issue for docker running on Oracle Linux 9.4

Jun 05 '24 13:06 ForsakenRei

Just deployed a fresh compose install on current Arch linux using 2024.6.3. Celery was taking one core to 100%. Setting ulimits for nofile resolved the issue.

I'm not experiencing this in k8s running on Talos.

Aug 08 '24 12:08 cubic3d

Thanks! This fixed the issue for me as well.

Oct 09 '24 10:10 justin8

Fix worked for me as well

Jan 25 '25 14:01 maxnoe

with 2025.8 we no longer use celery so this shouldn't be required anymore

Oct 10 '25 11:10 BeryJu

root: fix 100% CPU for worker container (#7025)

❌ Deploy Preview for authentik-storybook failed.

Codecov Report

Deploy Preview for authentik-storybook failed.