add graceful shutdown for node18 functions
Description
This PR adds graceful shutdown for template node18 functions.
Motivation and Context
- [x] I have raised an issue to propose this change (required)
Which issue(s) this PR fixes
Fixes #305
How Has This Been Tested?
- First create a function for the update node18 template.
- After the function is created, delete the function pod running with signal
SIGTERM, like :kubectl -n openfaas-fn exec testdrainfunc-b44d49bb6-9kzzh -- kill -s SIGTERM 1. - The logs shows the added logic is being implemented :
2023-05-26T09:18:38Z 2023/05/26 09:18:38 SIGTERM: no new connections in 15s
2023-05-26T09:18:38Z 2023/05/26 09:18:38 Removing lock-file : /tmp/.lock
2023-05-26T09:18:38Z Function got SIGTERM event, draining up to: 15s
2023-05-26T09:18:38Z Server gracefully shut down
2023-05-26T09:18:53Z 2023/05/26 09:18:53 No new connections allowed, draining: 0 requests
2023-05-26T09:18:53Z 2023/05/26 09:18:53 Exiting. Active connections: 0
Types of changes
- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Version change (see: Impact to existing users)
Impact to existing users
There will be no significant impact on how users use openfaas functions.
Checklist:
- [x] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I've read the CONTRIBUTION guide
- [x] I have signed-off my commits with
git commit -s - [ ] I have added tests to cover my changes.
- [x] All new and existing tests passed.
This looks very similar to what I was excepting. I think you'll need to test it and also need to wait for the health check duration before calling close on the server. See how we do that in the Go template.
I've sent you a trial license for OpenFaaS Pro/Standard.
Would you like to try testing it with a 10min shutdown time?
Installation -> https://docs.openfaas.com/deployment/pro/
Testing with long timeouts -> https://www.openfaas.com/blog/long-running-jobs/
I simulate it like this:
Deploy function... invoke it and watch the logs. (with a 10m sleep)
Then I update the code and image tag, and redeploy it.
If it's all working right, the Pod should go to Terminating but stay around until the invocation has completed successfully.
Here's my test function for Go/Python/Node for a longer timeout if you need it - https://github.com/alexellis/go-long
@alexellis
I test the update node18 template for
environment:
write_timeout: 10m2s
healthcheck_interval: 5s
Then the logs for the node18 func after kubectl scale -n openfaas-fn deploy/node18 --replicas=0 are:
2023-05-27T19:03:50Z 2023/05/27 19:03:50 SIGTERM: no new connections in 5s
2023-05-27T19:03:50Z 2023/05/27 19:03:50 Removing lock-file : /tmp/.lock
2023-05-27T19:03:50Z Function got SIGTERM event, draining up to: 10m2s
2023-05-27T19:03:50Z Server gracefully shut down
2023-05-27T19:03:55Z 2023/05/27 19:03:55 No new connections allowed, draining: 0 requests
2023-05-27T19:03:55Z 2023/05/27 19:03:55 Exiting. Active connections: 0
Thanks for working on this and for testing the change. What I'd like to see is a curl statement - followed by you scaling to zero replicas. Show that "time curl ..." completes despite you scaling down. You'll need the Pro/Standard license that I sent you separately.