serving
serving copied to clipboard
Queue Proxy does not exit after draining and server shutdown
What version of Knative?
1.3.2
Expected Behavior
When a pod is deleted, queue-proxy should drain the connections and then exit, and after that the user container should exit.
Actual Behavior
When a pod is deleted, queue-proxy drains the connections and shut down the server and prints Shutdown complete, exiting...
in the logs. However, it does not exit and is killed by kubelet after deletionGracePeriodSeconds
(300s).
Steps to Reproduce the Problem
Setup
- Setup Knative with Istio
- Create the following service:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: foo
annotations:
networking.knative.dev/disableAutoTLS: "true"
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "10"
autoscaling.knative.dev/targetUtilizationPercentage: "70"
spec:
containerConcurrency: 1
containers:
- name: api
image: some-image
resources:
requests:
memory: 400Mi
cpu: 50m
limits:
memory: 1024Mi
ports:
- containerPort: 8080
name: http1
- Put a relative constant load on the service: 0.1 requests per second with 200 ms Response time and spikes to 2 seconds response time
- The autoscaler should scale up and down the number of replicas.
- This produces many pods in deleting state, where the queue-proxy container is blocking the deletion, even after it has drained the connections
same
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
This issue or pull request is stale because it has been open for 90 days with no activity.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
/lifecycle stale
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.
Following up here - after the queue proxy finishes drain - Kubernetes will send a TERM signal to the user-container
(your application)
The shutdown will be blocked if your applicaiton doesn't perform a graceful shutdown.
As an example this python server doesn't have a graceful shutdown
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-test
spec:
template:
spec:
timeoutSeconds: 150
containers:
- image: python:3.9-slim
command: ["python"]
args: ["-m", "http.server", "8080"]
in contrast to nginx which does perform a graceful shutdown
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-test
spec:
template:
spec:
timeoutSeconds: 150
containers:
- image: nginx
ports:
- containerPort: 80
Also the queue proxy has a 30s drain time - if a request arrives the drain time is reset
We're experiencing the same problem with a 1.7.1
version of Knative serving. By deploying services showed by @dprotaso the nginx one is gracefully shutdown, the python one no.
Is there a "canonical" way to address this issue? Thank you
Is there a "canonical" way to address this issue? Thank you
I'm not familiar with python enough - but generally user applications will want to listen to SIGTERM and perform a graceful exit.
I wouldn't feel comfortable for Knative to try to force quit the user container to exist since there could be important things that occur on shutdown.
For anyone dealing with this issue in Python, try using dumb-init
@danielrubin1989: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.