zadig [bug] The image distribution process fails at 99% on the first attempt every time, but it succeeds upon retry.

What happened? c4779975701097ebb59587ccb3cfe862

I encountered an issue: the image distribution always fails at 99% on the first attempt, but it succeeds after a retry. However, using the retry policy in the scheduling strategy of the workflow task doesn't help — it still fails.

Install Methods Helm

Versions Used zadig: 3.2.0 kubernetes: 1.30

Environment

Cloud Provider: AWS EKS Resources: 4 Cores / 8 GB RAM OS:

Services Status

kubectl version

Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.7-eks-56e63d8

kubectl get po -n zadig


NAME                               READY   STATUS    RESTARTS      AGE
aslan-68679fb7b4-h5gp7             1/1     Running   0             6d
cron-684789745c-mzlvw              1/1     Running   0             6d6h
dind-0                             1/1     Running   0             18d
discovery-b7c54849-qrlg2           1/1     Running   1 (18d ago)   18d
gateway-d67cbd584-mnrsz            1/1     Running   1 (18d ago)   18d
gateway-proxy-679bbfdd4b-7d2jx     1/1     Running   0             18d
gloo-6bc8898b45-hm572              1/1     Running   2 (18d ago)   18d
hub-server-6898d9767b-fwj6m        1/1     Running   0             6d6h
kr-minio-5cc697f4c-nwd62           1/1     Running   0             18d
kr-redis-6bd58ffc5b-4bv54          1/1     Running   0             18d
plutus-vendor-78fd88577-hvcnc      1/1     Running   1 (18d ago)   18d
time-nlp-7bfbd8bc7b-dx2bd          1/1     Running   0             18d
user-5f5db6896d-pc8x2              1/1     Running   0             18d
vendor-portal-6c6cdf69b5-bf72b     1/1     Running   0             18d
zadig-portal-84bbc89cc5-2rsnw      1/1     Running   0             6d6h
zadig-zadig-dex-5b6d988c8d-f5fmz   1/1     Running   0             6d

Dec 30 '24 15:12 krisxia0506

Please check whether the corresponding image registry is correctly integrated and authorized successfully.
Can you use this image registry normally in the environment? Are you able to select tags from this registry as expected?
Also, please check if there are any error logs in the Aslan pod.

Jan 07 '25 07:01 PetrusZ

Please check whether the corresponding image registry is correctly integrated and authorized successfully.

Can you use this image registry normally in the environment? Are you able to select tags from this registry as expected?

Also, please check if there are any error logs in the Aslan pod.

Authorization has been successful because manually retrying after failure results in successful image distribution.
The image registry can be used normally in the environment, and tags can be selected during image deployment.
There are no error logs during image distribution, as shown in the screenshot.

CleanShot 2025-01-14 at 15 43 47@2x

Jan 14 '25 07:01 krisxia0506

There were no errors during login, but errors occurred when pulling the image. Please check the authorization for the image registry.

Also, have you tried running only the image distribution task to isolate the issue?

Jan 16 '25 06:01 PetrusZ

CleanShot 2025-01-16 at 15 30 05@2x I created a separate workflow with only the image distribution task. After selecting a previously built service component, I noticed there was no original image version.

You mentioned errors occur during the pull step and suggested checking the image registry authorization.

However, when I manually retry, it succeeds. The retry mechanism within the workflow’s scheduling policy for failures doesn’t work—it still reports errors unless I retry manually.

Jan 16 '25 07:01 krisxia0506

Since there has been no response from the user for an extended period, we will proceed to close this issue. If you have any further questions or need assistance, feel free to reopen or create a new issue.

Jun 18 '25 03:06 leozhang2018

Have you solved this problem?

Jun 18 '25 03:06 krisxia0506