flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

CLI and CRD have inconsistent behavior with respect to 'days' interval

Open kingdonb opened this issue 2 years ago • 1 comments

It is possible to get a GitRepository source into a state which can not be polled by the CLI anymore (flux get sources git) by setting the sync interval to a number of days.

Discussed in https://github.com/fluxcd/flux2/discussions/2190

Originally posted by dafstone December 7, 2021 The problem:

  • We attempted, foolishly to push a change to a GitRepository (not flux) that changed the interval to '1d' -- an illegal setting.
  • This caused the reconciliation of the flux-system GitRepository to fail
  • We pushed a change, correcting this error; however the GitRepository remains stuck

We get an error at the top of flux get all and in the Flux logs that clearly points to the error:

✗ v1beta1.GitRepositoryList.Items: []v1beta1.GitRepository: v1beta1.GitRepository.Spec: v1beta1.GitRepositorySpec.Reference: Interval: unmarshalerDecoder: time: unknown unit "d" in duration "1d"

However, at the same time we can see that the GitRepository has moved past the hash with the bad code it in, but still won't reconcile, and we consistantly get context deadline exceeded errors.

We are looking for a way to:

  • Fully reset the source controller to pick up the latest changes on flux-system (we've tried a bootstrap, this just made things worse)
  • Force the source controller to get the latest version of our flux-system repo as opposed to the bad version it's still loading
  • Another solution?

Any help or thoughts?

kingdonb avatar Dec 15 '21 13:12 kingdonb

Very late to the party, but I managed to paint myself into a similar corner in flux 0.31.3. The source controller pod would never go live and logged this:

k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1beta2.HelmRepository: failed to list *v1beta2.HelmRepository: time: unknown unit "d" in duration "1d"

Because of this, it would never pick up on any new changes committed to the repo.

I was able to work around the issue by doing the following:

  1. Get a list of helmrepositories by running: kubectl get helmrepositories.source.toolkit.fluxcd.io -A
  2. For each affected helmrepository, run kubectl edit helmrepositories.source.toolkit.fluxcd.io <helmrepository_name> -n <namespace>
  3. Under spec, look for interval and change to something that aligns to Golang's time duration string format - so, convert to multiples of 24 hours in this case (e.g.: 1d becomes 24h)
  4. Save the edit by typing wq!
  5. Repeat for each affected helmrepository

Hope that helps someone else in the future.

cjchand avatar Jul 24 '22 22:07 cjchand

This is an upstream issue, which was fixed in Flux by improving the CRD validation across the controllers (e.g. https://github.com/fluxcd/source-controller/pull/903).

Closing for now as there is nothing left to be done.

pjbgf avatar Nov 14 '22 18:11 pjbgf