uwsgi icon indicating copy to clipboard operation
uwsgi copied to clipboard

Using `max-worker-lifetime` kills workers without waiting for pending requests to finish

Open cipriancraciun opened this issue 6 years ago • 6 comments

It seems that if one uses max-worker-lifetime, the master process will trigger a kill towards the worker process, which will immediately exit without waiting for the pending request to finish.


In order to reproduce this configure the following:

master = true
workers = 2
threads = 0
enable-threads = false

# Python related configuration

# NOTE:  See the note bellow about this small limit.
max-worker-lifetime = 60
reload-mercy = 30
worker-reload-mercy = 30
reload-on-exception = true
exit-on-reload = true
reaper = true
die-on-term = true

Then start in a loop requests that take say 10% of that max-worker-lifetime (a simple Python app that just does time.sleep(6) would do). Thus almost one in 10 requests will be reset due to the master killing the worker process.

Regarding the max-worker-lifetime = 60 limit: in production I used 3600 (which is quite enough), however in order to reliably trigger this issue I've set this limit small enough.

cipriancraciun avatar Oct 14 '18 06:10 cipriancraciun

Had a similar issue with still running requests being killed when using max_requests. It only seems to take the last spawn into account, while there can still be running requests.

timdrijvers avatar Feb 08 '19 13:02 timdrijvers

@timdrijvers Have you found a workaround?

(In my case I just ignored the issue, and restart the whole application once a couple of weeks, as I don't have any "heavy" memory-leaks.) :(

cipriancraciun avatar Feb 23 '19 13:02 cipriancraciun

I am experiencing this as well.

brandontksmith avatar Sep 02 '19 02:09 brandontksmith

Same here, see https://stackoverflow.com/questions/58731398/uwsgi-worker-respawning-although-the-request-is-not-yet-finished

lazydaemon avatar Nov 06 '19 14:11 lazydaemon

I just experienced this also. 😞

darkvertex avatar Jul 08 '21 14:07 darkvertex

I'm also hitting this issue. And I think it the cause of the production issue seen in #2480

For me, what happen is:

  • uWSGI start the worker loop here https://github.com/unbit/uwsgi/blob/066b7fdf1bfa12f95652b8bd8cbd3532c8096b91/core/uwsgi.c#L3714
  • For all additional threads (id > 0) it will start simple_loop_run function: https://github.com/unbit/uwsgi/blob/066b7fdf1bfa12f95652b8bd8cbd3532c8096b91/core/loop.c#L62
  • All threads (additional and main thread) will loop on https://github.com/unbit/uwsgi/blob/066b7fdf1bfa12f95652b8bd8cbd3532c8096b91/core/loop.c#L138
  • When lifetime is reached, master request to stop processing by changing manage_next_request (https://github.com/unbit/uwsgi/blob/066b7fdf1bfa12f95652b8bd8cbd3532c8096b91/core/master_checks.c#L229)
  • So all worker threads will exit the loop:
    • any additional thread, will just terminate, without doing anything more (end of simple_loop_run function)
    • BUT the main thread has more function in its stack, and will reach https://github.com/unbit/uwsgi/blob/066b7fdf1bfa12f95652b8bd8cbd3532c8096b91/core/uwsgi.c#L3722
  • So the main thread, when exiting will exit() the process which does not wait for additional thread to finish processing.

I can reproduce getting HTTP error with the following:

cat > wsgi.py << EOF
import time
import uuid

def application(env, start_response):
    req_id = uuid.uuid4()
    with open("/tmp/req.log", "a") as fd:
        fd.write(f"start {req_id}\n")
    
    time.sleep(1)

    with open("/tmp/req.log", "a") as fd:
        fd.write(f"stop {req_id}\n")

    start_response('200 OK', [('Content-Type','text/html')])
    return [b"Hello World\n"]
EOF

uwsgi --module wsgi:application --http 127.0.0.1:8080 --master --workers 1 --threads 2 --max-worker-lifetime 20

Have a client doing request in a loop (I'm using sh -ec to stop on first error):

time sh -ec 'while true; do curl http://localhost:8080/;done'

I usually get error one 1st or 2nd restart of the worker (i.e. in 20 - 40 seconds). In addition if you look at req.log, you will see that on error some request didn't finished (a start $UUID is present without stop $UUID)

None of those behavior happen if adding a wait_for_threads(); just before the end_me(0);

PierreF avatar Sep 14 '22 16:09 PierreF

I believe this was fixed in #2626 which is now commit 06a22597bd419860904fae6f446d8e3b714f5afa

The fix would have shipped in v2.0.25, though the version's release notes are less explicit about the change than the commits https://github.com/unbit/uwsgi/compare/2.0.24...2.0.25

SteveByerly avatar Jun 26 '24 01:06 SteveByerly