cheroot icon indicating copy to clipboard operation
cheroot copied to clipboard

Thread join may block forever

Open scyclops opened this issue 5 years ago • 5 comments

Calling join on a thread may block the call forever when remaining_time is negative. And this can prevent cheroot from stopping properly.

https://github.com/cherrypy/cheroot/blob/cb9d3f46aa755d63e85d539a2efb7d1fa8f53935/cheroot/workers/threadpool.py#L285

One fix would be to change this line to be:

remaining_time = timeout and min(0, endtime - time.time())

scyclops avatar Sep 13 '20 16:09 scyclops

Please send a PR with tests. Also, we need some STR (can be in form of a test)

webknjaz avatar Sep 13 '20 20:09 webknjaz

After looking at the stop method, I think it's impossible for remaining_time to be negative. If I'm missing something, please send proof.

webknjaz avatar Sep 13 '20 21:09 webknjaz

Sorry there wasn't enough detail. It can happen in the following situation:

  1. The first thread joined is handling a slow request and a timeout occurs as a result of the join call.
  2. When the second thread is attempted to be joined, endtime is no longer in the future because the join timing out for the 1st thread happened at exactly endtime. And when endtime is a time in the past and you subtract the current time from it, it will always be negative. And every thread after the first thread will have it's join method called with a negative timeout.

The script below should reproduce the problem (tested with Python 2.7). It requires web.py and requests to be installed and probably also needs the changes from https://github.com/cherrypy/cheroot/pull/322 to be applied so that Ctrl+c works to stop the server. Once you run the script, wait a couple seconds and then press Ctrl+c to stop the server.


from __future__ import print_function

import web

import requests

from cheroot import wsgi

import time
from signal import signal, SIGTERM
from threading import Thread


urls = [
    r'/', 'SlowView'
]

class SlowView(object):
    def GET(self):
        time.sleep(10.0)
        return ''


def sig_term(signum, stack_frame):
    raise KeyboardInterrupt


def run(host='127.0.0.1', port=8080):

    app = web.application(urls, globals())
    
    server = wsgi.Server((host, port), app.wsgifunc())

    try:
        signal(SIGTERM, sig_term)
        server.start()

    except KeyboardInterrupt:

        print('stopping cheroot')

        start = time.time()
        server.stop()
        elapsed = time.time() - start

        print('took %.3f seconds to stop with a shutdown timeout of %s' % (elapsed, server.shutdown_timeout))


def make_request(sleep):
    time.sleep(sleep)

    while True:
        try:
            r = requests.get('http://127.0.0.1:8080/')
            if r.status_code != 200:
                break
        except:
            break


if __name__ == '__main__':

    for sleep in range(5):
        t = Thread(target=make_request, args=(sleep,))
        t.start()

    run()

    # then use Ctrl+c to stop cheroot
    # this depends on Ctrl+c working which requires the following PR: https://github.com/cherrypy/cheroot/pull/322

scyclops avatar Sep 14 '20 00:09 scyclops

Ah, I missed that it's in a loop and only checked how it's set at the beginning of the module. @scyclops do you think you could come up with a pytest-based reproducer?

webknjaz avatar Sep 17 '20 21:09 webknjaz

I don't have any more time to commit to this but I've applied the fix I recommended above in my fork and it's working well for me.

scyclops avatar Sep 18 '20 20:09 scyclops