AKS icon indicating copy to clipboard operation
AKS copied to clipboard

Windows pod stuck in terminating state with subpath in volumemounts

Open nimblenitin opened this issue 1 year ago • 6 comments

Describe the bug Windowds based node Pod is stuck in terminating state during deletion/replace of pod whenever I add two volume mounts with subpath. This does not happen if I add single volume mount with subpath. To Reproduce Steps to reproduce the behavior:

  1. Create a deployment with below volume and volumemounts. Make sure to create the configmap as well.
          volumeMounts:
            - name: configs
              mountPath: C:\inetpub\wwwroot\web.config
              subPath: web.config
            - name: cloudconfig
              mountPath: C:\CloudConfig
              subPath: credentials
      volumes:
        - name: configs
          configMap:
            name: test-report-executor-cm
        - name: cloudconfig
          configMap:
            name: cloud-config
  1. Try running below command: kubectl replace -n namespace -f filename.yaml --force

Expected behavior pod should not be stuck in terminating state. The pod is stuck at terminating state for more than 5d now. Screenshots image

Environment (please complete the following information):

  • CLI Version - v1.27.7
  • Kubernetes version - 1.27.7

Additional context kubelet logs:

E0612 15:37:32.338453 4496 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/X17e892a03e1c-configs podName:X nodeName:}" failed. No retries permitted until 2024-06-12 15:37:33.3384537 +0000 UTC m=+1476.528238001 (durationBeforeRetry 1s). Error: UnmountVolume.TearDown failed for volume "configs" (UniqueName: "kubernetes.io/configmap/X-configs") pod "XXXX4744-X" (UID: "X") : remove c:\var\lib\kubelet\pods\X\volumes\kubernetes.io~configmap\configs..2024_06_12_15_34_06.2642743558\Web.config: The process cannot access the file because it is being used by another process.

nimblenitin avatar May 23 '24 13:05 nimblenitin

We have the same issue (Sitecore windows containers). From our investigation, our analysis of the problem so far is ;

  • Container runs as user 'containeradministrator' and once the configmaps for our certs are mounted, it can't dismount due to user permissions .
  • we also seem to have this issue with more than one volumemount

Rick-healy avatar May 27 '24 15:05 Rick-healy

To add to the above; Our solution was to have one singular mount; a script which reads environment variables to write the contents of the target mount.

For example

mkdir  c:\inetpub\wwwroot\certs\
 
    echo "$env:test1_crt" > c:\inetpub\wwwroot\certs\test1.crt
    echo "$env:test2_crt" > c:\inetpub\wwwroot\certs\test2.crt
 
    try {
        ForEach($file in Get-ChildItem -Path  c:\inetpub\wwwroot\certs\*.crt)
        {
            Import-Certificate -FilePath $file.FullName -CertStoreLocation Cert:\LocalMachine\Root
            Import-Certificate -FilePath $file.FullName -CertStoreLocation Cert:\LocalMachine\Ca
        }
    }
    catch {
        Write-Host "An error occurred:"
        Write-Host $_
    }

By doing so, the pods would terminate as expected and our end result was still achieved.

mo-martin avatar Jun 19 '24 12:06 mo-martin

I can confirm this issue.

If you mount two volumes with a subpath (in our case one of them is a configmap and the other one is an Azure Files share), the pod stays endlessly in Terminating state.

I wanted to pinpoint what subsystem was responsible for the issue, so I grabbed the default IIS image from MS:

mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2022

And tried mounting a configmap with subpath, everything OK. Then mounted the same configmap with a subpath but on another location, pod cannot terminate.

          volume_mount {
            name       = "env-config"
            mount_path = "c:/env.json"
            sub_path   = "env.json"
          }

          volume_mount {
            name       = "env-config"
            mount_path = "c:/env2.json"
            sub_path   = "env.json"
          }

Kubelet logs:

E0712 07:36:38.858534    6376 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/9ebd5403-2617-4795-b3fe-7a483a723448-env-config podName:9ebd5403-2617-4795-b3fe-7a483a723448 nodeName:}" failed. No retries permitted until 2024-07-12 07:36:39.3577023 +0000 UTC m=+1705.940224101 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "env-config" (UniqueName: "kubernetes.io/configmap/9ebd5403-2617-4795-b3fe-7a483a723448-env-config") pod "9ebd5403-2617-4795-b3fe-7a483a723448" (UID: "9ebd5403-2617-4795-b3fe-7a483a723448") : remove c:\var\lib\kubelet\pods\9ebd5403-2617-4795-b3fe-7a483a723448\volumes\kubernetes.io~configmap\env-config\..2024_07_12_07_11_11.3010365152\env.json: The process cannot access the file because it is being used by another process.

In this example I am using the same configmap voulme twice, but originally this was happening with a ConfigMap + an Azure File Share, which are totally unrelated storages.

This issue seems to only affect Windows Containers, as I tested the same setup with a Linux container and could not reproduce.

david-garcia-garcia avatar Jul 12 '24 09:07 david-garcia-garcia

It seems like to be as same as https://github.com/kubernetes/kubernetes/issues/112630

AbelHu avatar Jul 23 '24 07:07 AbelHu

It is expected to be fixed in kubelet

AbelHu avatar Jul 23 '24 07:07 AbelHu

cc @jsturtevant

kiashok avatar Jul 26 '24 17:07 kiashok

there is another workaround for someone wants to use it to import certs volumeMounts to separate folders without subPath volumeMounts: - name: certs-path-1 mountPath: /certs/1 - name: certs-path-2 mountPath: /certs/2

with volumes like - name: certs-path-1 configMap: name: trust-bundle-ex items: - key: certs-ex.pem path: certs-ex.pem - name: certs-path-2 configMap: name: trust-bundle items: - key: certs.pem path: certs.pem

Be aware that files created in those locations "/certs/1/certs-ex.pem" are not regular files (even if you can get content via Get-Content or other “tools”), they are symlinks so something like "Import-Certificate -FilePath /certs/1/certs-ex.pem -CertStoreLocation Cert:\LocalMachine\Root" does NOT work on them.

You have to use the real file behind that symlink Import-Certificate -FilePath /certs/1/..data/certs-ex.pem -CertStoreLocation Cert:\LocalMachine\Root

lesio999 avatar Oct 08 '24 15:10 lesio999

I do not get it - how this could be closed (in 30 days) if the issue was confirmed and there is some work done or going - by @AbelHu "It is expected to be fixed in kubelet". .

lesio999 avatar Feb 19 '25 20:02 lesio999

@jsturtevant or @kiashok can you please provide an update.

sjwaight avatar Feb 19 '25 22:02 sjwaight

This is still an issue and not yet to be closed.

chrko avatar Apr 15 '25 11:04 chrko

Hello @AbelHu and @kiashok can you provide an update please.

sjwaight avatar Apr 16 '25 02:04 sjwaight

Adding "fixing" tag to block stale/close. This fix looks to be dependent on upstream fixes in Kubernetes as per earlier comment from @AbelHu.

sjwaight avatar Apr 16 '25 02:04 sjwaight

Hey @AbelHu @kiashok, are there any updates on this issue? Is it still in the 'fixing' phase?

julia-yin avatar Oct 30 '25 20:10 julia-yin