fleet [SURE-10390] Incorporate a Retry of the jobs that pull from the repositories when it fails

SURE-10390

Request description:

The fleet's gitjob that pull from the repositories marked by the gitRepos do not have a retry. My customer sees this happening often because it pushes Fleet to the limits, and sometimes there is a connection timeout, or etcd request timeout.

The customer would like that the gitjob retries before failing, as this would alleviate the issues.

Actual behavior:

The gitjob fails and errors out.

Expected behavior:

The gitjob fails but retries.

Workaround:

Is a workaround available and implemented? yes What is the workaround: Retry manually. Although this is cumbersome due to the number of bundles and resources.

Additional notes: See https://github.com/rancher/fleet/pull/3407, #3067

Aug 20 '25 08:08 kkaempf

As discussed in backlog refinement, this is still not fully resolved.

Aug 20 '25 08:08 kkaempf

We need to document FLEET_APPLY_CONFLICT_RETRIES?

Sep 01 '25 09:09 manno