faas-containerd
faas-containerd copied to clipboard
Removing a function with active replicas (task running) fails
Trying to remove a function that is currently running (has a RUNNING task), fails.
After trying to remove again, it succeeds.
Expected Behaviour
Function gets removed with first command.
Current Behaviour
Running function:
❯ faas store deploy figlet --name=figlet2
WARNING! Communication is not secure, please consider using HTTPS. Letsencrypt.org offers free SSL/TLS certificates.
Deployed. 200 OK.
URL: http://localhost:8081/function/figlet2
❯ sudo ctr -n openfaas-fn container ls
CONTAINER IMAGE RUNTIME
figlet2 docker.io/functions/figlet:0.13.0 io.containerd.runc.v2
❯ sudo ctr -n openfaas-fn task ls
TASK PID STATUS
figlet2 7252 RUNNING
Deploy logs:
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Update] request: {"service":"figlet2","image":"functions/figlet:0.13.0","network":"","envProcess":"figlet","envVars":{},"constraints":[],"secrets":[],"labels":{},"annotations":{},"limits":null,"requests":null,"readOnlyRootFilesystem":false}
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Update] service figlet2 not found
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 [Deploy] request: {"service":"figlet2","image":"functions/figlet:0.13.0","network":"","envProcess":"figlet","envVars":{},"constraints":[],"secrets":[],"labels":{},"annotations":{},"limits":null,"requests":null,"readOnlyRootFilesystem":false}
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 Deploy docker.io/functions/figlet:0.13.0 size: 5658006
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 Container ID: figlet2 Task ID figlet2: Task PID: 7252
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 16:42:43 figlet2 has IP: 10.62.0.163.
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Version: 0.13.0 SHA: fa93655d90d1518b04e7cfca7d7548d7d133a34e
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Read/write timeout: 5s, 5s. Port: 8080
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Writing lock-file to: /tmp/.lock
Jan 17 16:42:43 debian10 faas-containerd[6669]: 2020/01/17 21:42:43 Metrics server. Port: 8081
Trying to remove:
❯ faas-cli remove figlet2
Deleting: figlet2.
Server returned unexpected status code 500 error deleting container figlet2, figlet2, cannot delete running task figlet2: failed precondition
Logs:
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] request: {"functionName":"figlet2"}
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] removing CNI network for figlet2
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 [Delete] removed figlet2 with namespace /proc/7252/ns/net and ID figlet2-7252
Jan 17 16:44:16 debian10 faas-containerd[6669]: Status of figlet2 is: running
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 16:44:16 Need to kill figlet2
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 21:44:16 SIGTERM received.. shutting down server in 5s
Jan 17 16:44:16 debian10 faas-containerd[6669]: 2020/01/17 21:44:16 Removing lock-file : /tmp/.lock
Jan 17 16:44:21 debian10 faas-containerd[6669]: 2020/01/17 21:44:21 No new connections allowed. Exiting in: 5s
Jan 17 16:44:21 debian10 faas-containerd[6669]: 2020/01/17 16:44:21 [Delete] error removing figlet2, error deleting container figlet2, figlet2, cannot delete running task figlet2: failed precondition
Task gets stopped but container is not removed:
❯ sudo ctr -n openfaas-fn container ls
CONTAINER IMAGE RUNTIME
figlet2 docker.io/functions/figlet:0.13.0 io.containerd.runc.v2
❯ sudo ctr -n openfaas-fn task ls
TASK PID STATUS
figlet2 7252 STOPPED
Running remove command again, removes:
❯ faas-cli remove figlet2
Deleting: figlet2.
Removing old function.
Logs:
Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 [Delete] request: {"functionName":"figlet2"}
Jan 17 16:45:29 debian10 faas-containerd[6669]: Status of figlet2 is: stopped
Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 Need to kill figlet2
Jan 17 16:45:29 debian10 faas-containerd[6669]: 2020/01/17 16:45:29 [Delete] deleted figlet2
Possible Solution
Steps to Reproduce (for bugs)
Context
Your Environment
-
OS and architecture:
-
Versions:
go version
containerd -version
uname -a
cat /etc/os-release
Hi, did you try what I explained on slack yet? The timeout for deletions is around 3s but the watchdog stays holding for "write_timeout" seconds.
You need to deploy with a value lower than that. So try 1s.
Yes, when deploying with --env write_timeout=1s the function gets removed correctly but by using the default (no write_timeout parameter) it fails.
Great. So it's a timing problem. We can't wait indefinitely to delete a container so it needs to have a limit, maybe a bigger limit than what's there now.