apm-agent-python unable to restart uwsgi if upgrade apm version to 4.1.0

Describe the bug: ... I faced below error when I start/restart uwsgi. The error occurs after I upgrade the package elastic-apm from 2.1.1 to the newest 4.1.0. Can anyone help? Thanks a lot!

SIGINT/SIGQUIT received...killing workers...
worker 1 buried after 1 seconds
worker 2 buried after 1 seconds
worker 3 buried after 1 seconds
worker 5 buried after 1 seconds
worker 6 buried after 1 seconds
worker 7 buried after 1 seconds
Fri Mar 29 15:28:04 2019 - worker 4 (pid: 32580) is taking too much time to die...NO MERCY !!!
Fri Mar 29 15:28:04 2019 - worker 8 (pid: 32655) is taking too much time to die...NO MERCY !!!
worker 4 buried after 1 seconds
worker 8 buried after 1 seconds
goodbye to uWSGI.

Environment (please complete the following information)

OS: [e.g. Linux]
Python version: 2.7.5
Framework and version [e.g. Django 2.1]: Django 1.9
APM Server version: 4.1.0
Agent version: Linux version 3.10.0-327.36.3.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) elastic/apm-agent-python#1 SMP Mon Oct 24 16:09:20 UTC 2016

Additional context

Here is my uwsgi.ini

[uwsgi]
uid = cdportal
gid = cdportal
pidfile = /var/run/uwsgi/uwsgi.pid
# stats = /var/run/uwsgi/stats.sock
cap = setgid, setuid
chdir = /var/www/cdportal/current/backend
module = cdportal.wsgi:application
master = True
vacuum = True
max-requests = 1000
daemonize = /var/log/uwsgi/cdportal.log
socket = 127.0.0.1:49152
workers = 8
threads = 30
listen = 1024
enable-threads = True

Mar 29 '19 07:03 mkloveyy

Hi @mkloveyy

Can you try if version 4.2.1 of the agent works better? We fixed some uwsgi related issues in that version which might help with this issue.

Apr 02 '19 12:04 beniwohli

@beniwohli I'm sorry for that the problem still exists after I install the version 4.2.1. As I stop the uwsgi, some uwsgi workers still hang on there. I think this problem may be caused by the new feature after version 4.1.0(collections of CPU usage and Memory usage). I am still analyzing and I hope you can give me some advices. Thanks!

Apr 03 '19 07:04 mkloveyy

Untill now, I have seen two main errors in uwsgi logs:

*** uWSGI listen queue of socket "127.0.0.1:9001" (fd: 3) full !!! (101/100) ***

Fri Mar 29 15:28:04 2019 - worker 4 (pid: 32580) is taking too much time to die...NO MERCY !!!
Fri Mar 29 15:28:04 2019 - worker 8 (pid: 32655) is taking too much time to die...NO MERCY !!!

Apr 03 '19 07:04 mkloveyy

@mkloveyy If you have some time, could you try this?

Install version 4.2.2 of the agent (pip install elastic-apm==4.2.2)
Install https://pypi.org/project/namedthreads/ (pip install namedthreads)
Run uwsgi with the environment variable NAMEDTHREADS set to 1

Then, run pstree -ct $(pidof -s uwsgi) and post the output here. Thanks!

May 03 '19 08:05 beniwohli

@mkloveyy we just released a new version of the agent that should fix this issue, can you try to install 5.4.1 and see if the problem goes away?

I'll close this issue for now, please re-open it if you still encounter the bug.

Feb 18 '20 10:02 beniwohli

@beniwohli sry, we have tried both 5.4.1 and 5.4.2 but the problem still exists... Actually, we upgrade the apm and then stop and restart our service. Then we find that many uwsgi workers don't exit immediately( maybe apm is controlling them for doing something)..

Feb 19 '20 17:02 mkloveyy

@beniwohli sry, I cannot re-open this issue because it's not closed by me.

Feb 19 '20 17:02 mkloveyy

@mkloveyy would you mind trying if setting the transport_class setting to elasticapm.transport.http.Transport fixes the issue? This is a wild guess, so chances are it doesn't fix the issue, but I'm running out of ideas :blush:

Feb 27 '20 14:02 beniwohli

@beniwohli Sry it's still not worked. I see there is a solution in #792, is it better?

Apr 07 '20 10:04 mkloveyy

I do not think #792 will help -- that issue is just about adding the new pid to the metadata that is reported to the APM server, it doesn't actually change the behavior around threads.

Apr 13 '20 22:04 basepi

Hello folks,

I have similar issue with elastic-apm version 6.1.1.

The agent is working good when using flask server but with uwsgi the application doesn't work at all while I have logs about agent sending metrics to apm-server. Once I hit Control-C I have:

Fri Apr  9 12:08:05 2021 - worker 1 (pid: 537938) is taking too much time to die...NO MERCY !!!
Fri Apr  9 12:08:05 2021 - worker 2 (pid: 537939) is taking too much time to die...NO MERCY !!!
Fri Apr  9 12:08:05 2021 - worker 3 (pid: 537940) is taking too much time to die...NO MERCY !!!
Fri Apr  9 12:08:05 2021 - worker 4 (pid: 537941) is taking too much time to die...NO MERCY !!!

Environment:

elastic-apm 6.1.1
Flask 1.1.2
uWSGI 2.0.19.1

Thanks!

Apr 09 '21 10:04 asyd

apm-agent-python apm-agent-python copied to clipboard

unable to restart uwsgi if upgrade apm version to 4.1.0

apm-agent-python
apm-agent-python copied to clipboard