granian icon indicating copy to clipboard operation
granian copied to clipboard

Reload stops working when another change is detected before the worker finishes startup

Open masenf opened this issue 9 months ago • 5 comments

Testing on Debian 12, Python 3.11, granian 2.1.2

import time

async def app(scope, receive, send):
    assert scope['type'] == 'http'

    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ],
    })
    await send({
        'type': 'http.response.body',
        'body': b'Hello, world!',
    })


print ("Initializing")
time.sleep(2)
print ("Done")
granian --interface asginl --reload main:app

From another terminal, touch the file several times in quick succession to trigger overlapping reloads:

touch main.py
sleep 0.2
touch main.py

Subsequent touches to the file do not trigger a reload, unless CTRL-C is sent to the server terminal.

touch main.py

... nothing happens.


The issue has also been replicated in the python:3.12 and python:3.13 docker containers.

The issue is NOT reproducible on macOS 15.3.

Windows has not been tested.

masenf avatar Mar 21 '25 22:03 masenf

@masenf Granian doesn't do anything fancy on top of watchfiles (the library used to watch for changes), especially when we compare macos and Linux, so I'm not sure how to tackle this.

Maybe an issue should be opened on the watchfiles repo instead?

gi0baro avatar Mar 24 '25 10:03 gi0baro

@gi0baro I think the issue is deeper than watchfiles itself, it seems to arise whenever Granian attempts to stop a worker that has started to start but not finished starting. Ideally, Granian should not enter a bad state when that happens. I'm not sure on what exactly we should do, but we can either:

  1. wait until the worker finishes starting, then we tell it to stop, i think this is a bit difficult from the aspect that python granian right now doesn't know if rust granian worker has finished starting? i could be mistaken though, this could also not be possible if the worker entered an infinite loop
  2. stop the worker while it's starting, this can have unintended consequences, but is prob more consistent in general, i think this is easier, but we do need to figure out how the rust granian worker can read a message from the parent while it's starting

if you need help implementing either, i can try to assist, but since both of these seems to involve rust, you might have a better clue on what's going on

adhami3310 avatar Mar 24 '25 20:03 adhami3310

i did some investigative work, granian enters an infinite loop here:

granian/server/common.py

    def _stop_workers(self):
        for wrk in self.wrks:
            wrk.terminate()

        for wrk in self.wrks:
            wrk.join(self.workers_kill_timeout) # Is running forever here
            if self.workers_kill_timeout:
                # the worker might still be reported after `join`, let's context switch
                if wrk.is_alive():
                    time.sleep(0.001)
                if wrk.is_alive():
                    logger.warning(f'Killing worker-{wrk.idx} after it refused to gracefully stop')
                    wrk.kill()
                    wrk.join()

        self.wrks.clear()

It's simply because the previous terminate did not have any effect on the worker, so it's joining on no action. Setting workers_kill_timeout to any value fixes the issue but ideally calling .terminate on a worker that is starting should actually terminate it.

adhami3310 avatar Mar 25 '25 02:03 adhami3310

@adhami3310 yeah I think the actual issue is that once the main process reaches the join line, the worker hasn't registered signal listeners yet.

But also is quite hard for me to imagine a proper way to change that: I don't really like the idea to introduce IPC between workers and main process just to signal "hey, I'm ready", that would complicate the code by a lot for such a tiny advantage in my opinion.

Maybe just adding a default value to workers_kill_timeout when reloader is enabled (smth like 3.5 secs) will mitigate 99% of this occurring, wdyt?

gi0baro avatar Mar 25 '25 15:03 gi0baro

fyi, 2.2.1 now has a default 3.5s workers kill timeout when using the reloader. Not sure I can do anything else at the time being.

gi0baro avatar Apr 02 '25 19:04 gi0baro