Reload stops working when another change is detected before the worker finishes startup
Testing on Debian 12, Python 3.11, granian 2.1.2
import time
async def app(scope, receive, send):
assert scope['type'] == 'http'
await send({
'type': 'http.response.start',
'status': 200,
'headers': [
[b'content-type', b'text/plain'],
],
})
await send({
'type': 'http.response.body',
'body': b'Hello, world!',
})
print ("Initializing")
time.sleep(2)
print ("Done")
granian --interface asginl --reload main:app
From another terminal, touch the file several times in quick succession to trigger overlapping reloads:
touch main.py
sleep 0.2
touch main.py
Subsequent touches to the file do not trigger a reload, unless CTRL-C is sent to the server terminal.
touch main.py
... nothing happens.
The issue has also been replicated in the python:3.12 and python:3.13 docker containers.
The issue is NOT reproducible on macOS 15.3.
Windows has not been tested.
@masenf Granian doesn't do anything fancy on top of watchfiles (the library used to watch for changes), especially when we compare macos and Linux, so I'm not sure how to tackle this.
Maybe an issue should be opened on the watchfiles repo instead?
@gi0baro I think the issue is deeper than watchfiles itself, it seems to arise whenever Granian attempts to stop a worker that has started to start but not finished starting. Ideally, Granian should not enter a bad state when that happens. I'm not sure on what exactly we should do, but we can either:
- wait until the worker finishes starting, then we tell it to stop, i think this is a bit difficult from the aspect that python granian right now doesn't know if rust granian worker has finished starting? i could be mistaken though, this could also not be possible if the worker entered an infinite loop
- stop the worker while it's starting, this can have unintended consequences, but is prob more consistent in general, i think this is easier, but we do need to figure out how the rust granian worker can read a message from the parent while it's starting
if you need help implementing either, i can try to assist, but since both of these seems to involve rust, you might have a better clue on what's going on
i did some investigative work, granian enters an infinite loop here:
granian/server/common.py
def _stop_workers(self):
for wrk in self.wrks:
wrk.terminate()
for wrk in self.wrks:
wrk.join(self.workers_kill_timeout) # Is running forever here
if self.workers_kill_timeout:
# the worker might still be reported after `join`, let's context switch
if wrk.is_alive():
time.sleep(0.001)
if wrk.is_alive():
logger.warning(f'Killing worker-{wrk.idx} after it refused to gracefully stop')
wrk.kill()
wrk.join()
self.wrks.clear()
It's simply because the previous terminate did not have any effect on the worker, so it's joining on no action. Setting workers_kill_timeout to any value fixes the issue but ideally calling .terminate on a worker that is starting should actually terminate it.
@adhami3310 yeah I think the actual issue is that once the main process reaches the join line, the worker hasn't registered signal listeners yet.
But also is quite hard for me to imagine a proper way to change that: I don't really like the idea to introduce IPC between workers and main process just to signal "hey, I'm ready", that would complicate the code by a lot for such a tiny advantage in my opinion.
Maybe just adding a default value to workers_kill_timeout when reloader is enabled (smth like 3.5 secs) will mitigate 99% of this occurring, wdyt?
fyi, 2.2.1 now has a default 3.5s workers kill timeout when using the reloader. Not sure I can do anything else at the time being.