acr
acr copied to clipboard
Uploading images to geo-replicated ACRs can cause intermittent MANIFEST_BLOB_UNKNOWN (due to DNS bouncing)
Describe the bug I run an automated script that uploads eight images to ACR in parallel. Since any failure in any upload causes my entire script to fail, the ~10% failure rate here results in a >60% failure rate for my script.
The bug appears to be the following:
During an image upload to a geo-replicated ACR, DNS will sometimes resolve to chinaeast2 and will sometimes resolve to chinanorth2. I have heard this referred to as DNS bouncing. Since layers have not had time to replicate between the two regions, this will cause the upload process to fail.
The following error is given in these uploads:
Error: PUT https://[REDACTED].azurecr.cn/v2/[REDACTED]/manifests/[REDACTED]: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:[REDACTED]
As a user, this error is very difficult to troublshoot.
To Reproduce Steps to reproduce the behavior:
- Create a geo-replicated ACR in
chinaeast2andchinanorth2 - Upload images repeatedly to the ACR
- About 10% of uploads will fail with the following error, which is very difficult to troubleshoot
Error: PUT https://[REDACTED].azurecr.cn/v2/[REDACTED]/manifests/[REDACTED]: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:[REDACTED]
Expected behavior Ideally, this issue can be fully-mitigated by ACR.
Alternatively, the following change to the MANIFEST_BLOB_UNKNOWN could save users hours of time troubleshooting the issue:
If a registry is about to return a MANIFEST_BLOB_UNKNOWN error AND geo-replication is enabled, add a link to a Trouble Shooting Guide (TSG) with a description of the issue and with instructions for how to locally mitigate the DNS bouncing issue using local DNS resolution.
Screenshots If applicable, add screenshots to help explain your problem.
Any relevant environment information
- OS: Ubuntu
- Azure CLI/PowerShell/SDK version: N/A
- Docker version: N/A (I'm using crane)
- Datetime (UTC) when the issue occurred
- Registry and image names (internal - but feel free to reach out on Teams)
Additional context Add any other context about the problem here.
If any information is a concern to post here, you can create a support ticket or send an email to [email protected].
Also, if anyone else stumbles across this issue, it can be mitigated by either ensuring DNS queries for ACRs resolve deterministically to the same IP each time OR by disabling geo-replication for an ACR
Closing as this has been inactive for over three months. Please open a support ticket with our team for assistance.