azure-aci icon indicating copy to clipboard operation
azure-aci copied to clipboard

The spec.template.spec.terminationGracePeriodSeconds: 3600 setting has no effect

Open eugen-nw opened this issue 5 years ago • 17 comments

My container runs a Windows Console application in an Azure Kubernetes instance. I'm doing the SetConsoleCtrlHandler subscription, I catch the CTRL_SHUTDOWN_EVENT (6) and Thread.Sleep(TimeSpan.FromSeconds(3600)); so the SIGKILL won't get sent to the container. The container receives indeed the CTRL_SHUTDOWN_EVENT and logs on a separate thread one message/second to show for how long it kept waiting.

I'm adding the required registry settings,

USER ContainerAdministrator
RUN reg add hklm\system\currentcontrolset\services\cexecsvc /v ProcessShutdownTimeoutSeconds /t REG_DWORD /d 3600 && \
    reg add hklm\system\currentcontrolset\control /v WaitToKillServiceTimeout /t REG_SZ /d 3600000 /f
ADD publish/ /

I verified this running the container on my computer and 'docker stop -t <seconds>' achieves the delayed shutdown.

The relevant .yaml deployment file fragment.

spec:
  replicas: 1
  selector:
    matchLabels:
      app: aks-aci-boldiq-external-solver-runner
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: aks-aci-boldiq-external-solver-runner
    spec:
      terminationGracePeriodSeconds: 3600
      containers:
      - image: ...
        imagePullPolicy: Always
        name: boldiq-external-solver-runner
        resources:
          requests:
            memory: 8G
            cpu: 1
      imagePullSecrets:
        - name: docker-registry-secret-official
      nodeName: virtual-kubelet-aci-connector-windows-windows-westus

After deployment I ran the 'kubectl get pod aks-aci-boldiq-external-solver-runner-69bf9cd949-njzz2 -o yaml' command and verified that the setting below is present in the output:

  terminationGracePeriodSeconds: 3600

If I do 'kubectl delete pod', the containers stays alive only for the default 30 seconds instead of the 1 hour that I want to get. Could the problem be in the VK or could this behavior be caused by AKS please?

eugen-nw avatar Dec 11 '19 21:12 eugen-nw

@eugen-nw, so this actually not supported at all today on 2 levels: 1- VK package itself up till v1.1 (the current used by azure virtual kubelet) didn’t honor that. It simply calls a delete pod on provider and sets always 30 seconds. But this got updated in v1.2, so we should be able to utilize it in future. The other thing is that our provider don’t send any updates about pod after delete call, so actually it immediately gets deleted although the 30 secs showing from K8s side. This later point is going to be fixed shortly, i’m currently working on an update for that. 2- This is main problem, ACI doesn’t support a way to configure how the termination should be handled, and what grace period to be used if specified. The delete operation is synchronous too, so actual resource is removed regardless of actual pod cleanup that gets triggered on ACI’s backend. We’re aware of the limitations on ACI, but till these are supported, the fixes mentioned in (1) won’t make a difference. @macolso the async deletion is coming with new api, but I remember you/Deep mentioning about termination handling. Can you please elaborate if it is planned for next semester ?

ibabou avatar Dec 20 '19 07:12 ibabou

Thanks very much for having looked into this! When will this issue be fixed please? Our major customer is not pleased with the fact that some of their long running computations get killed midway through and need to be restarted on a different container.

eugen-nw avatar Dec 20 '19 18:12 eugen-nw

@ibabou your answer 2. above implies that even if we'd use Linux containers running on virtual-node-aci-linux we'd run into the exact same problem. I assume that virtual-node-aci-linux is the equivalent Linux ACI connector. Are both of these 2 statements correct please?

eugen-nw avatar Jan 11 '20 00:01 eugen-nw

@eugen-nw if you mean the graceful period and the wait on containers termination on ACI's side, yeah that's not currently supported to either Linux or Windows.

ibabou avatar Jan 11 '20 00:01 ibabou

Thanks very much, that's what I was asking about. That's very bad behavior on ACI's side. Do they plan to fix it?

eugen-nw avatar Jan 11 '20 00:01 eugen-nw

So our team owns both ACI service and AKS-VK integration. but I don't have an ETA about that feature. I'll let @dkkapur @macolso elaborate more.

ibabou avatar Jan 11 '20 00:01 ibabou

@eugen-nw indeed :( we're looking into fixing this in the coming months on ACI's side. Hope to have an update for you in terms of a concrete timeline shortly.

dkkapur avatar Jan 13 '20 22:01 dkkapur

@dkkapur: THANKS VERY MUCH for planning to address this problem soon! This is a major issue for our largest customer.

We scale our processing on demand, based on workload sent to containers through a Service Bus Queue. There are two distinct types of processing: 1). under 2 minutes (the majority) 2). over 40 minutes (occurs now and then). Whenever the AKS HPA scales down, it kills the containers that it spun during scale up. If any of the long processing operations happen to land on one of those scale-up containers, it will get aborted and currently we have no way of avoiding that. We've designed the solution such as the processing will restart on another container, but our customer is definitely not happy with the fact that the 40' processing may happen to run for much longer durations on occasion.

eugen-nw avatar Jan 13 '20 23:01 eugen-nw

Ya - I've been working on enabling graceful termination / lifecycle hooks for ACI. If you want to talk more about your use case, I'd love to set up some time - shoot me a piece of mail [email protected]

macolso avatar Jan 13 '20 23:01 macolso

Bumping into the same issue with the auto scaler.

image

4 months passed, are there any known workarounds? Or ETA for the fix?

AlexeyRaga avatar May 13 '20 10:05 AlexeyRaga

@dkkapur @macolso @ibabou Sorry for bumping it again, it hurts us quite a lot here, any news on this front?

AlexeyRaga avatar Jul 09 '20 00:07 AlexeyRaga

Probably customer focus is no longer trendy these days? I’ll check out the AWS offerings and will report back.

eugen-nw avatar Jul 09 '20 05:07 eugen-nw

Hi @AlexeyRaga , unfortunately no concrete ETA we can share at this point. We're happy to hop on a call and talk a bit to the product roadmap though - email shared above ^^

macolso avatar Jul 09 '20 23:07 macolso

This is a big drawback where the pods scheduled on virtual node does not support Pod Lifecycle Hooks or terminationGracePeriodSeconds. This functionality is needed to stop the pods from getting terminated during scaling-in.

Is there any timeline to implement this issue? @macolso

asipras avatar May 03 '21 17:05 asipras

Does the terminationGracePeriodSeconds work for aws eks pods on fargate ? Fargate nodes also looks like a kind of virtual nodes.

rustlingwind avatar Jul 15 '21 00:07 rustlingwind

Any progress on this at all yet? It's over 2 years since the last update.

Andycharalambous avatar Nov 10 '22 00:11 Andycharalambous

Hey @Andycharalambous , we will start working on it soon, no ETA yet.

helayoty avatar Nov 10 '22 05:11 helayoty