fastify-cli icon indicating copy to clipboard operation
fastify-cli copied to clipboard

Improvements to graceful termination strategy suggested by templates

Open pclalv opened this issue 3 years ago • 3 comments

Prerequisites

  • [X] I have written a descriptive issue title
  • [X] I have searched existing issues to ensure the issue has not already been raised

Issue

Templates using close-with-grace should use a delay should be greater 500ms

as of the time of writing this, the eject template uses close-with-grace with a delay of 500ms: https://github.com/fastify/fastify-cli/blob/f4453bfff028844b2e0ce9930d4055d08fe018d5/templates/eject/server.js#L22

cross-referencing this with the close-with-grace source code, this means that close-with-grace will forcibly exit its current NodeJS process after waiting just 500ms for its callback to complete.

I discovered that a fastify app using this delay is effectively given only 500ms to complete any requests that it'd already started responding to at the time that SIGTERM or SIGINT was received. so, a request that takes one second to complete will run for half a second and then be cut off, and the requester will see a 50x response. this can be validated this with a simple route that sleeps for a few seconds before returning a response.

that seems a little unreasonably short to me, given that platforms like Kubernetes and Heroku first send SIGTERM, and then, after a grace period of 30 seconds (by default), SIGKILL. why should this default to terminating itself after a mere 500ms when Heroku and Kubernetes both afford orders of magnitude more time for a process to gracefully terminate? does it make sense to use this default, given that many users of this template are liable to use the value of 500ms blindly? actually, that's not a fact, but I bet you it's true :) and users who know better will know better - it's the users who don't know better that we might want to be more considerate of.

Avoid forcibly exiting the parent NodeJS process

close-with-grace ultimately calls exit on its NodeJS process, either when its delay has run out, or when its callback has succeeded or failed, whichever comes first. I find this a little presumptuous - if by an innocent mistake something needing graceful termination isn't being handled in the close-with-grace callback, it'll just be killed by the exit. I might want to know if such a condition were affecting my app.

I think that close-with-grace calling exit is a solution a bit like cutting the Gordian knot - sure, the NodeJS process is gonna exit, and so the problem of NodeJS processes not exiting is sovled, but at the cost of obscuring issues with code that would otherwise be preventing NodeJS from exiting of its own accord by running out of things to do. I would prefer to let these issues surface and be dealt with by, for example by the well-named package why-is-node-running, the very existence and usage of which suggests that NodeJS processes hanging around for obscure reasons is a somewhat common problem.

I would instead ask, what's the problem if the NodeJS process doesn't exit on its own? in my case, I'm either gonna be running the fastify server locally, and I'm either gonna get impatient and kill it, or bother to figure out what's wrong; or the fastify server is gonna be running on Kubernetes, and Kubernetes will get tired of waiting and kill it. consider this just one data point among many, but I still find it valid to question the utility of this behavior in these templates.

I realize that the precise behavior of the close-with-grace library is out of the scope of the fastify-cli, but the fastify-cli's choice to use close-with-grace, given its behavior, is the relevant thing that I'm questioning.

pclalv avatar Aug 04 '22 22:08 pclalv

Every single team I worked implemented some form of graceful shutdown to let the application complete ongoing requests. The majority of them implemented it wrong, causing production bugs. Therefore I wrote close-with-grace and this this ships with it enabled.

The way this works combined with Fastify will ensure no request is truncated nor lost in a healthy scenario (< the timeout). No one wants to lose requests.


Every application you start with fastify-cli is a fastify application: if there is something else running you should not use this tool. This module is opinionated by choice to provide a simple system: use Fastify directly and control your own process for anything more complex.


Now, I'm happy to get a PR to raise the shutdown timeout to 30s or any other sensible value and to make it configurable if it's not.

mcollina avatar Aug 05 '22 21:08 mcollina

Now, I'm happy to get a PR to raise the shutdown timeout to 30s or any other sensible value and to make it configurable if it's not.

@mcollina since require('dotenv').config() is already in place, could we use process.env.CLOSE_GRACE_DELAY || 500 ? I could open a PR with this.

avanelli avatar Aug 13 '22 07:08 avanelli

Sure thing, go ahead!

mcollina avatar Aug 13 '22 14:08 mcollina