[bug] The image distribution process fails at 99% on the first attempt every time, but it succeeds upon retry.
What happened?
I encountered an issue: the image distribution always fails at 99% on the first attempt, but it succeeds after a retry. However, using the retry policy in the scheduling strategy of the workflow task doesn't help — it still fails.
Install Methods Helm
Versions Used zadig: 3.2.0 kubernetes: 1.30
Environment
Cloud Provider: AWS EKS Resources: 4 Cores / 8 GB RAM OS:
Services Status
kubectl version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.7-eks-56e63d8
kubectl get po -n zadig
NAME READY STATUS RESTARTS AGE
aslan-68679fb7b4-h5gp7 1/1 Running 0 6d
cron-684789745c-mzlvw 1/1 Running 0 6d6h
dind-0 1/1 Running 0 18d
discovery-b7c54849-qrlg2 1/1 Running 1 (18d ago) 18d
gateway-d67cbd584-mnrsz 1/1 Running 1 (18d ago) 18d
gateway-proxy-679bbfdd4b-7d2jx 1/1 Running 0 18d
gloo-6bc8898b45-hm572 1/1 Running 2 (18d ago) 18d
hub-server-6898d9767b-fwj6m 1/1 Running 0 6d6h
kr-minio-5cc697f4c-nwd62 1/1 Running 0 18d
kr-redis-6bd58ffc5b-4bv54 1/1 Running 0 18d
plutus-vendor-78fd88577-hvcnc 1/1 Running 1 (18d ago) 18d
time-nlp-7bfbd8bc7b-dx2bd 1/1 Running 0 18d
user-5f5db6896d-pc8x2 1/1 Running 0 18d
vendor-portal-6c6cdf69b5-bf72b 1/1 Running 0 18d
zadig-portal-84bbc89cc5-2rsnw 1/1 Running 0 6d6h
zadig-zadig-dex-5b6d988c8d-f5fmz 1/1 Running 0 6d
- Please check whether the corresponding image registry is correctly integrated and authorized successfully.
- Can you use this image registry normally in the environment? Are you able to select tags from this registry as expected?
- Also, please check if there are any error logs in the Aslan pod.
- Please check whether the corresponding image registry is correctly integrated and authorized successfully.
- Can you use this image registry normally in the environment? Are you able to select tags from this registry as expected?
- Also, please check if there are any error logs in the Aslan pod.
- Authorization has been successful because manually retrying after failure results in successful image distribution.
- The image registry can be used normally in the environment, and tags can be selected during image deployment.
- There are no error logs during image distribution, as shown in the screenshot.
There were no errors during login, but errors occurred when pulling the image. Please check the authorization for the image registry.
Also, have you tried running only the image distribution task to isolate the issue?
I created a separate workflow with only the image distribution task. After selecting a previously built service component, I noticed there was no original image version.
You mentioned errors occur during the pull step and suggested checking the image registry authorization.
However, when I manually retry, it succeeds. The retry mechanism within the workflow’s scheduling policy for failures doesn’t work—it still reports errors unless I retry manually.
Since there has been no response from the user for an extended period, we will proceed to close this issue. If you have any further questions or need assistance, feel free to reopen or create a new issue.
Have you solved this problem?