apm-agent-python
apm-agent-python copied to clipboard
unable to restart uwsgi if upgrade apm version to 4.1.0
Describe the bug: ...
I faced below error when I start/restart uwsgi. The error occurs after I upgrade the package elastic-apm from 2.1.1 to the newest 4.1.0. Can anyone help? Thanks a lot!
SIGINT/SIGQUIT received...killing workers...
worker 1 buried after 1 seconds
worker 2 buried after 1 seconds
worker 3 buried after 1 seconds
worker 5 buried after 1 seconds
worker 6 buried after 1 seconds
worker 7 buried after 1 seconds
Fri Mar 29 15:28:04 2019 - worker 4 (pid: 32580) is taking too much time to die...NO MERCY !!!
Fri Mar 29 15:28:04 2019 - worker 8 (pid: 32655) is taking too much time to die...NO MERCY !!!
worker 4 buried after 1 seconds
worker 8 buried after 1 seconds
goodbye to uWSGI.
Environment (please complete the following information)
- OS: [e.g. Linux]
- Python version: 2.7.5
- Framework and version [e.g. Django 2.1]: Django 1.9
- APM Server version: 4.1.0
- Agent version: Linux version 3.10.0-327.36.3.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) elastic/apm-agent-python#1 SMP Mon Oct 24 16:09:20 UTC 2016
Additional context
Here is my uwsgi.ini
[uwsgi]
uid = cdportal
gid = cdportal
pidfile = /var/run/uwsgi/uwsgi.pid
# stats = /var/run/uwsgi/stats.sock
cap = setgid, setuid
chdir = /var/www/cdportal/current/backend
module = cdportal.wsgi:application
master = True
vacuum = True
max-requests = 1000
daemonize = /var/log/uwsgi/cdportal.log
socket = 127.0.0.1:49152
workers = 8
threads = 30
listen = 1024
enable-threads = True
Hi @mkloveyy
Can you try if version 4.2.1 of the agent works better? We fixed some uwsgi related issues in that version which might help with this issue.
@beniwohli I'm sorry for that the problem still exists after I install the version 4.2.1. As I stop the uwsgi, some uwsgi workers still hang on there. I think this problem may be caused by the new feature after version 4.1.0(collections of CPU usage and Memory usage). I am still analyzing and I hope you can give me some advices. Thanks!
Untill now, I have seen two main errors in uwsgi logs:
*** uWSGI listen queue of socket "127.0.0.1:9001" (fd: 3) full !!! (101/100) ***
Fri Mar 29 15:28:04 2019 - worker 4 (pid: 32580) is taking too much time to die...NO MERCY !!!
Fri Mar 29 15:28:04 2019 - worker 8 (pid: 32655) is taking too much time to die...NO MERCY !!!
@mkloveyy If you have some time, could you try this?
- Install version 4.2.2 of the agent (
pip install elastic-apm==4.2.2) - Install https://pypi.org/project/namedthreads/ (
pip install namedthreads) - Run uwsgi with the environment variable
NAMEDTHREADSset to1
Then, run pstree -ct $(pidof -s uwsgi) and post the output here. Thanks!
@mkloveyy we just released a new version of the agent that should fix this issue, can you try to install 5.4.1 and see if the problem goes away?
I'll close this issue for now, please re-open it if you still encounter the bug.
@beniwohli sry, we have tried both 5.4.1 and 5.4.2 but the problem still exists... Actually, we upgrade the apm and then stop and restart our service. Then we find that many uwsgi workers don't exit immediately( maybe apm is controlling them for doing something)..
@beniwohli sry, I cannot re-open this issue because it's not closed by me.
@mkloveyy would you mind trying if setting the transport_class setting to elasticapm.transport.http.Transport fixes the issue? This is a wild guess, so chances are it doesn't fix the issue, but I'm running out of ideas :blush:
@beniwohli Sry it's still not worked. I see there is a solution in #792, is it better?
I do not think #792 will help -- that issue is just about adding the new pid to the metadata that is reported to the APM server, it doesn't actually change the behavior around threads.
Hello folks,
I have similar issue with elastic-apm version 6.1.1.
The agent is working good when using flask server but with uwsgi the application doesn't work at all while I have logs about agent sending metrics to apm-server. Once I hit Control-C I have:
Fri Apr 9 12:08:05 2021 - worker 1 (pid: 537938) is taking too much time to die...NO MERCY !!!
Fri Apr 9 12:08:05 2021 - worker 2 (pid: 537939) is taking too much time to die...NO MERCY !!!
Fri Apr 9 12:08:05 2021 - worker 3 (pid: 537940) is taking too much time to die...NO MERCY !!!
Fri Apr 9 12:08:05 2021 - worker 4 (pid: 537941) is taking too much time to die...NO MERCY !!!
Environment:
- elastic-apm 6.1.1
- Flask 1.1.2
- uWSGI 2.0.19.1
Thanks!