fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[SURE-10390] Incorporate a Retry of the jobs that pull from the repositories when it fails

Open kkaempf opened this issue 4 months ago • 2 comments

SURE-10390

Request description:

The fleet's gitjob that pull from the repositories marked by the gitRepos do not have a retry. My customer sees this happening often because it pushes Fleet to the limits, and sometimes there is a connection timeout, or etcd request timeout. 

The customer would like that the gitjob retries before failing, as this would alleviate the issues.

Actual behavior:

The gitjob fails and errors out.

Expected behavior:

The gitjob fails but retries.

Workaround:

Is a workaround available and implemented? yes What is the workaround: Retry manually. Although this is cumbersome due to the number of bundles and resources.

Additional notes: See https://github.com/rancher/fleet/pull/3407, #3067

kkaempf avatar Aug 20 '25 08:08 kkaempf

As discussed in backlog refinement, this is still not fully resolved.

kkaempf avatar Aug 20 '25 08:08 kkaempf

We need to document FLEET_APPLY_CONFLICT_RETRIES?

manno avatar Sep 01 '25 09:09 manno