acr icon indicating copy to clipboard operation
acr copied to clipboard

Failed artifact streaming pull due to AKS node out of disk

Open maneeshcdls opened this issue 1 year ago • 1 comments

Describe the bug When Artifact streaming is enabled on a Linux node in our Kubernetes cluster, we're experiencing problems with image pulls. Specifically, we're encountering "Failed to pull image" errors during deployments. Additionally, over time, the disk space on the node becomes filled up, leading to the eviction of all pods.

Observations: With Artifact Streaming Enabled on node:

Failed image pulls during deployments. Disk space gradually fills up over time. All pods eventually get evicted due to the lack of available disk space.

With Artifact Streaming Disabled on node: Deployments function as expected. Images are pulled correctly without errors. No significant disk space issues observed.

error: Failed to pull image ".azurecr.io/products/api:master": rpc error: code = Canceled desc = failed to pull and unpack image ".azurecr.io/products/api:master": failed to resolve reference "**.azurecr.io/products/api:master": failed to do request: Head "https://localhost:8578/v2/products/api/manifests/master?ns=.azurecr.io": context cancel

To Reproduce Steps to reproduce the behavior:

  1. Enable artifact streaming on ACR
  2. Enable Artifact Streaming on Node
  3. Deploy pods with image from ACR

Expected behavior Pods should be deployed

Screenshots If applicable, add screenshots to help explain your problem.

Any relevant environment information

  • OS: Ubuntu 20.04
  • AKS / nodePools version is 1.28.3

Additional context Add any other context about the problem here.

maneeshcdls avatar Mar 27 '24 08:03 maneeshcdls

@maneeshcdls are you able to provide more detailed repro steps? It's been a while - but we can't exactly repro which parts of overlaybd's garbage collection on nodes needs improvement unless you are able to provide the exact sequence of repro'ing artifact streaming pulls on the node.

johnsonshi avatar Apr 22 '25 18:04 johnsonshi

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Aug 11 '25 02:08 github-actions[bot]

This issue was closed because it has been stalled for 30 days with no activity.

github-actions[bot] avatar Sep 11 '25 02:09 github-actions[bot]