Segmentation fault(core dumped) in arbiter.py
I use gunicorn + flask in Kubernetes. The start commend is "gunicorn fishing_main:app -b 0.0.0.0:50007 -w 31" It could work normally in most time, but occasionally "core dumped" error happened in flask pod in k8s. When a large number of requests were sent to flask, there was a high probability of it happening. Python3.8 is used.
error log in pods is: [WARNING] Worker with pid 1540 was terminated due to signal 11
The signal 11 sometimes could cause core dumped, other time it just made the worker exist.
I added : faulthandler.enable() in code and got more detail traceback.
The error can cause core dumped is:
2022-08-17 17:37:30 | Segmentation fault (core dumped) | | 2022-08-17 17:37:30 | File "/opt/conda/bin/gunicorn", line 8 in
| | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 209 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 357 in sleep | | 2022-08-17 17:37:30 | Thread 0x00007f42030f1740 (most recent call first): | | 2022-08-17 17:37:30 | | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 890 in _bootstrap | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 932 in _bootstrap_inner | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 870 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/KcangNacos/nacos.py", line 152 in __registerBeatThreadRun | | 2022-08-17 17:37:30 | Thread 0x00007f3fdc8a4700 (most recent call first): | | 2022-08-17 17:37:30 | | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 890 in _bootstrap | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 932 in _bootstrap_inner | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 870 in run | | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/KcangNacos/nacos.py", line 26 in __healthyCheckThreadRun | | 2022-08-17 17:37:30 | Thread 0x00007f3fd7fff700 (most recent call first): | | 2022-08-17 17:37:30 | | | 2022-08-17 17:37:30 | Fatal Python error: Segmentation fault | | 2022-08-17 17:37:30 | 2022-08-17 17:37:30 - view.code_graph - INFO - 10.244.9.7 - 1567 - get vertexes value, len is 2 | | 2022-08-17 17:37:30 | 2022-08-17 17:37:30 - view.code_graph - INFO - 10.244.9.7 - 1567 - get edge value, len is 1 | | 2022-08-17 17:37:20 | [2022-08-17 17:37:20 +0800] [1586] [INFO] Booting worker with pid: 1586 | | 2022-08-17 17:37:20 | [2022-08-17 17:37:20 +0800] [7] [WARNING] Worker with pid 1540 was terminated due to signal 11 | | 2022-08-17 17:37:20 | File "/opt/conda/bin/gunicorn", line 8 in | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211 in run | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 551 in manage_workers | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 622 in spawn_workers | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589 in spawn_worker | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/base.py", line 142 in init_process | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 125 in run | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 84 in run_for_one | | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 36 in wait | | 2022-08-17 17:37:20 | Thread 0x00007f42030f1740 (most recent call first): | | 2022-08-17 17:37:20 | | | 2022-08-17 17:37:20 | Fatal Python error: Segmentation fault
The error can not cause core dumped is :
2022-08-17 17:35:10 | [2022-08-17 17:35:10 +0800] [1567] [INFO] Booting worker with pid: 1567 | | 2022-08-17 17:35:10 | [2022-08-17 17:35:10 +0800] [7] [WARNING] Worker with pid 1431 was terminated due to signal 11 | | 2022-08-17 17:35:10 | File "/opt/conda/bin/gunicorn", line 8 in
| | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211 in run | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 551 in manage_workers | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 622 in spawn_workers | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589 in spawn_worker | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/base.py", line 142 in init_process | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 125 in run | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 84 in run_for_one | | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 36 in wait | | 2022-08-17 17:35:10 | Thread 0x00007f42030f1740 (most recent call first): | | 2022-08-17 17:35:10 | | | 2022-08-17 17:35:10 | Fatal Python error: Segmentation fault
It seems that gunicorn is waiting for signal to send in arbiter.py by using select.select() I guess is the fd of sockets or pipes in gunicorn cause the segmentation fault. Little info can be found online.
Just to +1 this, I tried to use various applications (e.g.: Airflow 2.2.5) that uses Gunicorn last month on an Apple M2 laptop with Python 3.8 & macOS Monterey - the gunicorn workers could not even spawn and were killed immediately by the system due to segmentation faults. I did not look into in more details in that time, but it might be related to this issue as well.
how much ram is used by your app ? How much ram is provided to the container?
i am also seeing this error locally when running on an Apple M1 laptop,
[2022-11-02 22:42:03 -0700] [4819] [INFO] Starting gunicorn 20.1.0
[2022-11-02 22:42:03 -0700] [4819] [INFO] Listening at: http://127.0.0.1:8000 (4819)
[2022-11-02 22:42:03 -0700] [4819] [INFO] Using worker: sync
[2022-11-02 22:42:03 -0700] [4832] [INFO] Booting worker with pid: 4832
[2022-11-02 22:42:03 -0700] [4833] [INFO] Booting worker with pid: 4833
[2022-11-02 22:42:03 -0700] [4834] [INFO] Booting worker with pid: 4834
[2022-11-02 22:42:03 -0700] [4835] [INFO] Booting worker with pid: 4835
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4832 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4836] [INFO] Booting worker with pid: 4836
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4833 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4837] [INFO] Booting worker with pid: 4837
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4834 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4838] [INFO] Booting worker with pid: 4838
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4835 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4839] [INFO] Booting worker with pid: 4839
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4836 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4840] [INFO] Booting worker with pid: 4840
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4837 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4841] [INFO] Booting worker with pid: 4841
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4838 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4842] [INFO] Booting worker with pid: 4842
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4839 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4843] [INFO] Booting worker with pid: 4843
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4840 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4844] [INFO] Booting worker with pid: 4844
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4841 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4845] [INFO] Booting worker with pid: 4845
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4842 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4848] [INFO] Booting worker with pid: 4848
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4843 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4849] [INFO] Booting worker with pid: 4849
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4844 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4850] [INFO] Booting worker with pid: 4850
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4845 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4851] [INFO] Booting worker with pid: 4851
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4848 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4852] [INFO] Booting worker with pid: 4852
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4849 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4853] [INFO] Booting worker with pid: 4853
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4850 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4854] [INFO] Booting worker with pid: 4854
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4851 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4855] [INFO] Booting worker with pid: 4855
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4852 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4856] [INFO] Booting worker with pid: 4856
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4853 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4857] [INFO] Booting worker with pid: 4857
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4854 was terminated due to signal 11
this starts when the application is launched and continues forever. the app's ram usage is negligible, probably a few dozen MB on a machine with 64GB of ram. this happens when using any number of workers, even just a single worker.
Any update here? I have the same error
Same problem here. Running Airflow 2.4.3 with postgres module. I tried it with Python 3.9 and 3.10. Also tried different postgres version 13, 14 and 15. Using airflow without postgres but sqlite instead works fine. as soon as i use the postgres module with Airflow the gunicorn workers are killed immediately with signal 11. Couldn't find a workaround yet.
Seeing the same signal 11 on M1 Pro running Python 3.10. Any updates?
UPDATE: I fixed the issue by deleting all site-packages and reinstalling Python.
any update? Seeing the same error as well.
what fo you mean by reproducing on the M1? I don't reproduce it myself. can you share a minimum env?
stalled issue.
Adding --preload fixed this for me
Same error over here!