gunicorn icon indicating copy to clipboard operation
gunicorn copied to clipboard

Segmentation fault(core dumped) in arbiter.py

Open dingyuqi opened this issue 3 years ago • 1 comments

I use gunicorn + flask in Kubernetes. The start commend is "gunicorn fishing_main:app -b 0.0.0.0:50007 -w 31" It could work normally in most time, but occasionally "core dumped" error happened in flask pod in k8s. When a large number of requests were sent to flask, there was a high probability of it happening. Python3.8 is used.

error log in pods is: [WARNING] Worker with pid 1540 was terminated due to signal 11

The signal 11 sometimes could cause core dumped, other time it just made the worker exist.

I added : faulthandler.enable() in code and got more detail traceback.

The error can cause core dumped is:

2022-08-17 17:37:30 | Segmentation fault (core dumped)   |   | 2022-08-17 17:37:30 | File "/opt/conda/bin/gunicorn", line 8 in   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 209 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 357 in sleep   |   | 2022-08-17 17:37:30 | Thread 0x00007f42030f1740 (most recent call first):   |   | 2022-08-17 17:37:30 |     |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 890 in _bootstrap   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 932 in _bootstrap_inner   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 870 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/KcangNacos/nacos.py", line 152 in __registerBeatThreadRun   |   | 2022-08-17 17:37:30 | Thread 0x00007f3fdc8a4700 (most recent call first):   |   | 2022-08-17 17:37:30 |     |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 890 in _bootstrap   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 932 in _bootstrap_inner   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/threading.py", line 870 in run   |   | 2022-08-17 17:37:30 | File "/opt/conda/lib/python3.8/site-packages/KcangNacos/nacos.py", line 26 in __healthyCheckThreadRun   |   | 2022-08-17 17:37:30 | Thread 0x00007f3fd7fff700 (most recent call first):   |   | 2022-08-17 17:37:30 |     |   | 2022-08-17 17:37:30 | Fatal Python error: Segmentation fault   |   | 2022-08-17 17:37:30 | 2022-08-17 17:37:30 - view.code_graph - INFO - 10.244.9.7 - 1567 - get vertexes value, len is 2   |   | 2022-08-17 17:37:30 | 2022-08-17 17:37:30 - view.code_graph - INFO - 10.244.9.7 - 1567 - get edge value, len is 1   |   | 2022-08-17 17:37:20 | [2022-08-17 17:37:20 +0800] [1586] [INFO] Booting worker with pid: 1586   |   | 2022-08-17 17:37:20 | [2022-08-17 17:37:20 +0800] [7] [WARNING] Worker with pid 1540 was terminated due to signal 11   |   | 2022-08-17 17:37:20 | File "/opt/conda/bin/gunicorn", line 8 in   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211 in run   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 551 in manage_workers   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 622 in spawn_workers   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589 in spawn_worker   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/base.py", line 142 in init_process   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 125 in run   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 84 in run_for_one   |   | 2022-08-17 17:37:20 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 36 in wait   |   | 2022-08-17 17:37:20 | Thread 0x00007f42030f1740 (most recent call first):   |   | 2022-08-17 17:37:20 |     |   | 2022-08-17 17:37:20 | Fatal Python error: Segmentation fault

The error can not cause core dumped is :

2022-08-17 17:35:10 | [2022-08-17 17:35:10 +0800] [1567] [INFO] Booting worker with pid: 1567   |   | 2022-08-17 17:35:10 | [2022-08-17 17:35:10 +0800] [7] [WARNING] Worker with pid 1431 was terminated due to signal 11   |   | 2022-08-17 17:35:10 | File "/opt/conda/bin/gunicorn", line 8 in   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 67 in run   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 231 in run   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/app/base.py", line 72 in run   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211 in run   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 551 in manage_workers   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 622 in spawn_workers   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589 in spawn_worker   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/base.py", line 142 in init_process   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 125 in run   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 84 in run_for_one   |   | 2022-08-17 17:35:10 | File "/opt/conda/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 36 in wait   |   | 2022-08-17 17:35:10 | Thread 0x00007f42030f1740 (most recent call first):   |   | 2022-08-17 17:35:10 |     |   | 2022-08-17 17:35:10 | Fatal Python error: Segmentation fault

It seems that gunicorn is waiting for signal to send in arbiter.py by using select.select() I guess is the fd of sockets or pipes in gunicorn cause the segmentation fault. Little info can be found online.

dingyuqi avatar Aug 19 '22 08:08 dingyuqi

Just to +1 this, I tried to use various applications (e.g.: Airflow 2.2.5) that uses Gunicorn last month on an Apple M2 laptop with Python 3.8 & macOS Monterey - the gunicorn workers could not even spawn and were killed immediately by the system due to segmentation faults. I did not look into in more details in that time, but it might be related to this issue as well.

bfaludi avatar Sep 07 '22 09:09 bfaludi

how much ram is used by your app ? How much ram is provided to the container?

benoitc avatar Oct 18 '22 13:10 benoitc

i am also seeing this error locally when running on an Apple M1 laptop,

[2022-11-02 22:42:03 -0700] [4819] [INFO] Starting gunicorn 20.1.0
[2022-11-02 22:42:03 -0700] [4819] [INFO] Listening at: http://127.0.0.1:8000 (4819)
[2022-11-02 22:42:03 -0700] [4819] [INFO] Using worker: sync
[2022-11-02 22:42:03 -0700] [4832] [INFO] Booting worker with pid: 4832
[2022-11-02 22:42:03 -0700] [4833] [INFO] Booting worker with pid: 4833
[2022-11-02 22:42:03 -0700] [4834] [INFO] Booting worker with pid: 4834
[2022-11-02 22:42:03 -0700] [4835] [INFO] Booting worker with pid: 4835
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4832 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4836] [INFO] Booting worker with pid: 4836
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4833 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4837] [INFO] Booting worker with pid: 4837
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4834 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4838] [INFO] Booting worker with pid: 4838
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4835 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4839] [INFO] Booting worker with pid: 4839
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4836 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4840] [INFO] Booting worker with pid: 4840
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4837 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4841] [INFO] Booting worker with pid: 4841
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4838 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4842] [INFO] Booting worker with pid: 4842
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4839 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4843] [INFO] Booting worker with pid: 4843
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4840 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4844] [INFO] Booting worker with pid: 4844
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4841 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4845] [INFO] Booting worker with pid: 4845
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4842 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4848] [INFO] Booting worker with pid: 4848
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4843 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4849] [INFO] Booting worker with pid: 4849
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4844 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4850] [INFO] Booting worker with pid: 4850
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4845 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4851] [INFO] Booting worker with pid: 4851
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4848 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4852] [INFO] Booting worker with pid: 4852
[2022-11-02 22:42:04 -0700] [4819] [WARNING] Worker with pid 4849 was terminated due to signal 11
[2022-11-02 22:42:04 -0700] [4853] [INFO] Booting worker with pid: 4853
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4850 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4854] [INFO] Booting worker with pid: 4854
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4851 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4855] [INFO] Booting worker with pid: 4855
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4852 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4856] [INFO] Booting worker with pid: 4856
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4853 was terminated due to signal 11
[2022-11-02 22:42:05 -0700] [4857] [INFO] Booting worker with pid: 4857
[2022-11-02 22:42:05 -0700] [4819] [WARNING] Worker with pid 4854 was terminated due to signal 11

this starts when the application is launched and continues forever. the app's ram usage is negligible, probably a few dozen MB on a machine with 64GB of ram. this happens when using any number of workers, even just a single worker.

dannygoldstein avatar Nov 03 '22 05:11 dannygoldstein

Any update here? I have the same error

littlerookie avatar Nov 25 '22 06:11 littlerookie

Same problem here. Running Airflow 2.4.3 with postgres module. I tried it with Python 3.9 and 3.10. Also tried different postgres version 13, 14 and 15. Using airflow without postgres but sqlite instead works fine. as soon as i use the postgres module with Airflow the gunicorn workers are killed immediately with signal 11. Couldn't find a workaround yet.

didatus avatar Nov 29 '22 23:11 didatus

Seeing the same signal 11 on M1 Pro running Python 3.10. Any updates?

UPDATE: I fixed the issue by deleting all site-packages and reinstalling Python.

michaelroyzen avatar Feb 03 '23 22:02 michaelroyzen

any update? Seeing the same error as well.

jonjacobs2 avatar Feb 06 '23 15:02 jonjacobs2

what fo you mean by reproducing on the M1? I don't reproduce it myself. can you share a minimum env?

benoitc avatar Feb 06 '23 18:02 benoitc

stalled issue.

benoitc avatar May 07 '23 19:05 benoitc

Adding --preload fixed this for me

dannygoldstein avatar Sep 08 '23 21:09 dannygoldstein

Same error over here!

ginwakeup avatar Nov 19 '23 15:11 ginwakeup