AKS Windows pod stuck in terminating state with subpath in volumemounts

Describe the bug Windowds based node Pod is stuck in terminating state during deletion/replace of pod whenever I add two volume mounts with subpath. This does not happen if I add single volume mount with subpath. To Reproduce Steps to reproduce the behavior:

Create a deployment with below volume and volumemounts. Make sure to create the configmap as well.

          volumeMounts:
            - name: configs
              mountPath: C:\inetpub\wwwroot\web.config
              subPath: web.config
            - name: cloudconfig
              mountPath: C:\CloudConfig
              subPath: credentials
      volumes:
        - name: configs
          configMap:
            name: test-report-executor-cm
        - name: cloudconfig
          configMap:
            name: cloud-config

Try running below command: kubectl replace -n namespace -f filename.yaml --force

Expected behavior pod should not be stuck in terminating state. The pod is stuck at terminating state for more than 5d now. Screenshots

Environment (please complete the following information):

CLI Version - v1.27.7
Kubernetes version - 1.27.7

Additional context kubelet logs:

E0612 15:37:32.338453 4496 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/X17e892a03e1c-configs podName:X nodeName:}" failed. No retries permitted until 2024-06-12 15:37:33.3384537 +0000 UTC m=+1476.528238001 (durationBeforeRetry 1s). Error: UnmountVolume.TearDown failed for volume "configs" (UniqueName: "kubernetes.io/configmap/X-configs") pod "XXXX4744-X" (UID: "X") : remove c:\var\lib\kubelet\pods\X\volumes\kubernetes.io~configmap\configs..2024_06_12_15_34_06.2642743558\Web.config: The process cannot access the file because it is being used by another process.

May 23 '24 13:05 nimblenitin

We have the same issue (Sitecore windows containers). From our investigation, our analysis of the problem so far is ;

Container runs as user 'containeradministrator' and once the configmaps for our certs are mounted, it can't dismount due to user permissions .
we also seem to have this issue with more than one volumemount

May 27 '24 15:05 Rick-healy

To add to the above; Our solution was to have one singular mount; a script which reads environment variables to write the contents of the target mount.

For example

mkdir  c:\inetpub\wwwroot\certs\
 
    echo "$env:test1_crt" > c:\inetpub\wwwroot\certs\test1.crt
    echo "$env:test2_crt" > c:\inetpub\wwwroot\certs\test2.crt
 
    try {
        ForEach($file in Get-ChildItem -Path  c:\inetpub\wwwroot\certs\*.crt)
        {
            Import-Certificate -FilePath $file.FullName -CertStoreLocation Cert:\LocalMachine\Root
            Import-Certificate -FilePath $file.FullName -CertStoreLocation Cert:\LocalMachine\Ca
        }
    }
    catch {
        Write-Host "An error occurred:"
        Write-Host $_
    }

By doing so, the pods would terminate as expected and our end result was still achieved.

Jun 19 '24 12:06 mo-martin

I can confirm this issue.

If you mount two volumes with a subpath (in our case one of them is a configmap and the other one is an Azure Files share), the pod stays endlessly in Terminating state.

I wanted to pinpoint what subsystem was responsible for the issue, so I grabbed the default IIS image from MS:

mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2022

And tried mounting a configmap with subpath, everything OK. Then mounted the same configmap with a subpath but on another location, pod cannot terminate.

          volume_mount {
            name       = "env-config"
            mount_path = "c:/env.json"
            sub_path   = "env.json"
          }

          volume_mount {
            name       = "env-config"
            mount_path = "c:/env2.json"
            sub_path   = "env.json"
          }

Kubelet logs:

E0712 07:36:38.858534    6376 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/configmap/9ebd5403-2617-4795-b3fe-7a483a723448-env-config podName:9ebd5403-2617-4795-b3fe-7a483a723448 nodeName:}" failed. No retries permitted until 2024-07-12 07:36:39.3577023 +0000 UTC m=+1705.940224101 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "env-config" (UniqueName: "kubernetes.io/configmap/9ebd5403-2617-4795-b3fe-7a483a723448-env-config") pod "9ebd5403-2617-4795-b3fe-7a483a723448" (UID: "9ebd5403-2617-4795-b3fe-7a483a723448") : remove c:\var\lib\kubelet\pods\9ebd5403-2617-4795-b3fe-7a483a723448\volumes\kubernetes.io~configmap\env-config\..2024_07_12_07_11_11.3010365152\env.json: The process cannot access the file because it is being used by another process.

In this example I am using the same configmap voulme twice, but originally this was happening with a ConfigMap + an Azure File Share, which are totally unrelated storages.

This issue seems to only affect Windows Containers, as I tested the same setup with a Linux container and could not reproduce.

Jul 12 '24 09:07 david-garcia-garcia

It seems like to be as same as https://github.com/kubernetes/kubernetes/issues/112630

Jul 23 '24 07:07 AbelHu

It is expected to be fixed in kubelet

Jul 23 '24 07:07 AbelHu

cc @jsturtevant

Jul 26 '24 17:07 kiashok

there is another workaround for someone wants to use it to import certs volumeMounts to separate folders without subPath volumeMounts: - name: certs-path-1 mountPath: /certs/1 - name: certs-path-2 mountPath: /certs/2

with volumes like - name: certs-path-1 configMap: name: trust-bundle-ex items: - key: certs-ex.pem path: certs-ex.pem - name: certs-path-2 configMap: name: trust-bundle items: - key: certs.pem path: certs.pem

Be aware that files created in those locations "/certs/1/certs-ex.pem" are not regular files (even if you can get content via Get-Content or other “tools”), they are symlinks so something like "Import-Certificate -FilePath /certs/1/certs-ex.pem -CertStoreLocation Cert:\LocalMachine\Root" does NOT work on them.

You have to use the real file behind that symlink Import-Certificate -FilePath /certs/1/..data/certs-ex.pem -CertStoreLocation Cert:\LocalMachine\Root

Oct 08 '24 15:10 lesio999

I do not get it - how this could be closed (in 30 days) if the issue was confirmed and there is some work done or going - by @AbelHu "It is expected to be fixed in kubelet". .

Feb 19 '25 20:02 lesio999

@jsturtevant or @kiashok can you please provide an update.