uwsgi icon indicating copy to clipboard operation
uwsgi copied to clipboard

DAMN ! worker 7 (pid: 1343) died, killed by signal 11 :( trying respawn ...

Open jedie opened this issue 6 years ago • 33 comments

I'm using docker... After i switched from https://github.com/phusion/baseimage-docker (phusion/baseimage:0.10.1 with Python v3.5) to https://hub.docker.com/_/python/ (python:3.6-alpine with Python v3.6)

After this i get very often the error:

DAMN ! worker X (pid: Y) died, killed by signal 11 :( trying respawn ...

The rest of the setup is the same and used uWSGI==2.0.17

Any idea?!?

jedie avatar May 16 '18 08:05 jedie

Hey @jedie - i get the same error. I am building from python:3.6-alpine as well. My ENV and CMD in Dockerfile looks like this..

ENV UWSGI_WSGI_FILE=base/wsgi.py UWSGI_HTTP=:8000 UWSGI_MASTER=1 UWSGI_WORKERS=2 UWSGI_THREADS=8 UWSGI_UID=1000 UWSGI_GID=2000

CMD ["uwsgi", "--http-auto-chunked", "--http-keepalive", "--static-map", "/media/=/code/media/", "--static-map", "/static/=/code/static/"]

I am a bit worried of using this in a PRODUCTION environment.

Rob

robnardo avatar Jul 11 '18 15:07 robnardo

I switched to 3.6-slim-stretch as a "work-a-round"...

jedie avatar Jul 12 '18 07:07 jedie

Also getting this all of a sudden too.

18/07/2018 12:46:08DAMN ! worker 1 (pid: 75) died, killed by signal 11 :( trying respawn ...
18/07/2018 12:46:08Respawned uWSGI worker 1 (new pid: 80)

v9Chris avatar Jul 18 '18 11:07 v9Chris

I found that switching the uwsgi config to only one thread makes this go away. Here is my uwsgi config (from Dockerfile)..

ENV UWSGI_WSGI_FILE=base/wsgi.py UWSGI_HTTP=:8000 UWSGI_MASTER=1 UWSGI_WORKERS=8 UWSGI_UID=1000 UWSGI_GID=2000 UWSGI_TOUCH_RELOAD=touch-reload.txt UWSGI_LAZY_APPS=1 UWSGI_WSGI_ENV_BEHAVIOR=holy

robnardo avatar Jul 18 '18 14:07 robnardo

I can confirm that configuring it to use one thread and this goes away.

deathemperor avatar Sep 19 '18 10:09 deathemperor

I'm also seeing this on python:3.6-alpine3.7. Works with threads = 1, random 502s from signal 11s with threads = 2.

beaugunderson avatar Oct 10 '18 04:10 beaugunderson

python:3.7-alpine3.8 did not help but switching to python:3.7-slim-stretch did. Would prefer to use alpine but this will be our workaround for now.

beaugunderson avatar Oct 10 '18 16:10 beaugunderson

Hi, I also encountered the same problem. When I run flask uwsgi to call keras (using tensorflow backend) object detection API, an error “DAMN ! worker 1 (pid: 5240)died, killed by signal 11:(trying respawn……)”. Then I try to use only one thread, but it doesn't work. Instead, another error occurs which is " !!!uWSGI process 347 got Segmentation Fault!!!". My configuration file is as follows: config Can anyone give me some helps? Thanks !

zhongdixiu avatar Nov 05 '18 02:11 zhongdixiu

I ran into a similar issue, though for me segfaults were traced down to anything that tried to use ssl (e.g. to talk to a remote API). Changing to stretch-slim seemed to resolve the issue.

kball avatar Nov 20 '18 19:11 kball

Just wanted to note I ran across this issue with python3.6:alpine-3.8 but it was solved with python3.6:alpine-3.9, using uwsgi==2.0.17.1

cridenour avatar Feb 06 '19 19:02 cridenour

I'm still getting this using uwsgi 2.0.18 on alpine 3.7.. Others still having the same problem?

xeor avatar Mar 04 '19 10:03 xeor

Still having this problem, is there a way to make uwsgi exit if this happens, I have my service configured to restart on fail.

Better than being in an inconsistent state, running but not alive

I'm using

:»  uwsgi --version                                                                                                                                              
2.0.18
:» lsb_release                                                                                                                                                                             
No LSB modules are available.

:» lsb_release -a                                                                                                                                                                          No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic

asyncmind0 avatar Mar 05 '19 23:03 asyncmind0

Can confirm that switching to Alpine 3.9 fixed that problem for me. I had the same symptoms, completely out of the blue.

One of the most significant changes in 3.9 is the return to OpenSSL (from LibreSSL), I can imagine how changing such a foundational library could make a difference. It's also completely possible that there is a looming bug somewhere in my software that is no longer triggered due to the different underlying libraries.

tamentis avatar Mar 08 '19 19:03 tamentis

I also meet this problem

Python 3.7.2
uwsgi --version
2.0.18

Mon-ius avatar Mar 19 '19 07:03 Mon-ius

I also meet this problem. (But strangely, in alpine 3.9) Base image : python:3.6.8-alpine3.9 uwsgi --version : 2.0.18 but switching threads=1 helps to solve the issue.

mightydeveloper avatar Mar 29 '19 03:03 mightydeveloper

Getting the same with python:3.6.8-alpine3.9 and uwsgi==2.0.15

Seems to get fixed by increasing uwsgi's thread-stacksize to 512. Now rolling with 2 or more threads without workers dying.

lekksi avatar Apr 10 '19 16:04 lekksi

In my case, I turn off the option "enable-threads". I'm not sure if this experience will help you.

Python version: 3.6.7 , uWSGI 2.0.18 (64bit)

koorukuroo avatar Apr 22 '19 10:04 koorukuroo

Any update on this issue? I have also ran in to the same issue with uWSGI --version 2.0.18 and python:3.6 image.

aliashkar avatar Nov 12 '19 18:11 aliashkar

Same issue with python:3.7-alpine-3.9. I had to switch to a different distro: debian.

adimux avatar Nov 15 '19 19:11 adimux

I think this error is due to uswgi config. For my Django projects, I have been using Docker (based on python:3.7-alpine) in production with no issues. Below are my Dockerfile, docker-entrypoint.sh and uswgi.ini files - which were borrowed and inspired by other online articles and research. Hope this helps other folks.

Dockerfile:

FROM python:3.7-alpine
COPY ./src/requirements.txt /requirements.txt
RUN set -ex \
	&& apk add --no-cache --virtual .build-deps \
		gcc g++ make libc-dev musl-dev linux-headers pcre-dev \
        mariadb-dev \
		openssl-dev \
		uwsgi-python3 \
	&& pip3 install --upgrade pip \
	&& pip3 install --upgrade wheel \
	&& if [ ! -e /usr/bin/pip ]; then ln -s pip3 /usr/bin/pip ; fi \
	&& if [[ ! -e /usr/bin/python ]]; then ln -sf /usr/bin/python3 /usr/bin/python; fi \
	&& LIBRARY_PATH=/lib:/usr/lib /bin/sh -c "pip install --no-cache-dir -r /requirements.txt" \
	&& runDeps="$( \
		scanelf --needed --nobanner --recursive /usr/local \
			| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
			| sort -u \
			| xargs -r apk info --installed \
			| sort -u \
	)" \
	# add dependencies to the '.python-rundeps' virtual package (we will keep these)
	&& apk add --virtual .python-rundeps $runDeps \
	&& apk del .build-deps \
	# add non-build packages..
	&& apk add mariadb-client

RUN mkdir /code
WORKDIR /code/
ADD ./src /code/

EXPOSE 8000
ENV DJANGO_SETTINGS_MODULE=_base.settings
RUN DATABASE_URL='' python manage.py collectstatic --noinput && chmod a+x /code/docker-entrypoint.sh

ENTRYPOINT ["/code/docker-entrypoint.sh"]

docker-entrypoint.sh

#!/bin/sh

while ! mysqladmin ping -h"$MYSQL_HOST" --silent; do
    echo "database is unavailable - sleeping for 2 secs"
    sleep 2
done

if [ "x$DJANGO_MANAGEPY_MIGRATE" = 'xon' ]; then
    echo 'attempting to run "migrate" ..'
    python manage.py migrate --noinput
else
    echo 'DJANGO_MANAGEPY_MIGRATE is not "on", skipping'        
fi

echo "copying mime.types to /etc dir .."
cp mime.types /etc/mime.types

echo "starting uwsgi.."
uwsgi uwsgi.ini

uwsgi.ini

[uwsgi]
strict = true
master = true
enable-threads = true
vacuum = true                        ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true                   ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true

disable-logging = true               ; Disable built-in logging 
log-4xx = true                       ; but log 4xx's anyway
log-5xx = true                       ; and 5xx's

harakiri = 120                       ; forcefully kill workers after XX seconds
; py-callos-afterfork = true           ; allow workers to trap signals

max-requests = 1000                  ; Restart workers after this many requests
max-worker-lifetime = 3600           ; Restart workers after this many seconds
reload-on-rss = 2048                 ; Restart workers after this much resident memory
worker-reload-mercy = 60             ; How long to wait before forcefully killing workers

cheaper-algo = busyness
processes = 64                       ; Maximum number of workers allowed
cheaper = 8                          ; Minimum number of workers allowed
cheaper-initial = 16                 ; Workers created at startup
cheaper-overload = 1                 ; Length of a cycle in seconds
cheaper-step = 8                     ; How many workers to spawn at a time

cheaper-busyness-multiplier = 30     ; How many cycles to wait before killing workers
cheaper-busyness-min = 20            ; Below this threshold, kill workers (if stable for multiplier cycles)
cheaper-busyness-max = 70            ; Above this threshold, spawn new workers
cheaper-busyness-backlog-alert = 16  ; Spawn emergency workers if more than this many requests are waiting in the queue
cheaper-busyness-backlog-step = 2    ; How many emergency workers to create if there are too many requests in the queue

wsgi-file = /code/_base/wsgi.py
http = :8000
static-map = /static/=/code/static/
uid = 1000
gid = 2000
touch-reload = /code/reload-uwsgi

robnardo avatar Nov 15 '19 20:11 robnardo

Same problem here, using debian:buster image as a base and Python 3.7. I tried both values of enable-threads and a few others settings but it still breaks. Weird enough, the very same Docker image runs normally on my computer, but gives this obscure error on our Kubernetes cluster, so I suspect it has something to do with the kernel or the network.

I noticed that Python 3.7 is not among the officially supported ones, so I downgraded to Python 3.5 but the error manifests nonetheless.

jacopofar avatar Dec 02 '19 14:12 jacopofar

@jacopofar I too am getting the same error on kubernetes but not when I run locally. My image is based on https://github.com/dockerfiles/django-uwsgi-nginx

asherp avatar Dec 09 '19 23:12 asherp

@jacopofar , @asherp, @aliashkar - any chance there is a stacktrace in the logs before the "killed by signal 11" line and could paste it here?

It would also be very helpful if you could reveal some information about your apps: Are you by any chance using psycopg2 2.7.x wheels and/or other Python wheels that ship their own libssl?

It appears there's a known issue with wheels that include their own libssl (or other libs) - see #1569 and #1590 (also this: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/)

awelzel avatar Dec 14 '19 17:12 awelzel

@awelzel I tried to reproduce but cannot get it anymore ¯_(ツ)_/¯

I don't remember any additional stacktrace, it only printed that message. This is my requirements.txt for that version:

uwsgi==2.0.18
boto3==1.9.67
pytest==5.2.2
pytest-cov==2.8.1
flake8==3.7.9
pandas==0.25.2
plotly==4.2.1
psycopg2-binary==2.8.3
sqlalchemy==1.2.15
dash==1.5.1
dash_auth==1.3.2
dash-bootstrap-components==0.7.2
requests==2.22.0
pyarrow==0.15.1

I'm not aware of any embedded libssl except for psycopg2, sorry for not being able to provide more details :/

jacopofar avatar Dec 18 '19 11:12 jacopofar

Getting the same with python:3.6.8-alpine3.9 and uwsgi==2.0.15

Seems to get fixed by increasing uwsgi's thread-stacksize to 512. Now rolling with 2 or more threads without workers dying.

It also apparently solved my use case. Is there a way to track uwsgi stack memory consumption to be sure that it happens for out of memory reason ?

eburghar avatar Jan 17 '20 13:01 eburghar

the same error occurred when i try to run a job with frequet http requests. i guess the error should due to long timeout. i solved it by setting much bigger harakiri value in uwsgi.ini,then it's working well.

wss404 avatar Mar 26 '20 12:03 wss404

I'm still getting this using uwsgi 2.0.18 on alpine 3.7.. Others still having the same problem?

I met this issue at the same env

jasonTu avatar Jun 05 '20 06:06 jasonTu

@jacopofar I too am getting the same error on kubernetes but not when I run locally. My image is based on https://github.com/dockerfiles/django-uwsgi-nginx

I almost tried every single thing explained here. Still exactly same error occurred due to uwsgi server. This is specifically for a particular flask endpoint whenever I deployed to k83 cluster and worked perfectly in dev machine. Surprisingly, requesting more resource fixed the issue.

resources:
  limits:
    memory: 1Gi
  requests:
    memory: 512Mi

arviCV avatar Jul 01 '21 18:07 arviCV

I have also encountered this problem. my uwsgi version is 2.0.18 . and threads per worker sets to 6 This is my analysis: one thread ended request and called uwsgi_close_request, and in uwsgi_close_request, it founds the worker's delta_requests reached max_requests, then it calls goodbye_cruel_world, cursed the worker and then calls simple_goodbye_cruel_world, in simple_goodbye_cruel_world wait for threads end。 However, there is a thread processing a time-consuming problem, but it is not actually stuck. And So after a reload-mercy time(for me it's 60s), in uwsgi_master_check_mercy, it directly killed this worker."

I wonder if there is a more graceful way to handle this, for example, in simple_goodbye_cruel_world, set manage_next_request to zero before wait_for_threads ,thus stop receive requests.then in uwsgi_master_check_mercy wait for the threads to end befor killing it with signal 9. if the worker really stucsk , it can be killed by harakiri

ylmuhaha avatar Sep 05 '23 09:09 ylmuhaha

So far, upping threads from 1 to 4 seems to have helped for me.

Stephane-Ag avatar Sep 06 '23 23:09 Stephane-Ag