source-controller
source-controller copied to clipboard
Flux Source Controller Fails to List Remotes
Describe the bug
Source controller randomly has issues listing revisions from the remote(GitLab in this case) leading to these errors:
{"level":"error","ts":"2023-06-20T12:09:39.735Z","msg":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer","controller":"gitrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"GitRepository","GitRepository":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"e258ec4f-35e2-48e5-9af2-f7715f7c4cb4","error":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer"}
{"level":"error","ts":"2023-06-20T12:09:39.766Z","msg":"Reconciler error","controller":"gitrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"GitRepository","GitRepository":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"e258ec4f-35e2-48e5-9af2-f7715f7c4cb4","error":"failed to checkout and determine revision: unable to list remote for 'https://gitlab/sre/gitops/sre-flux': stream error: stream ID 3; INTERNAL_ERROR; received from peer"}
The endpoint it calls is up and has no connection issues we can see during this period. We suspect it is a bug in net/http due to this ticket: https://github.com/golang/go/issues/51323
Steps to reproduce
- add a source
- check the logs and see the intermittent failures
Expected behavior
Source controller handles this error via retries or something instead of failing to get around the bug.
Screenshots and recordings
No response
OS / Distro
Kubernetes 1.24.x
Flux version
v0.38.3
Flux check
► checking prerequisites ✗ flux 0.38.3 <2.0.0-rc.5 (new version is available, please upgrade) ✔ Kubernetes 1.24.12-gke.500 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.34.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.34.1 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.28.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.0.0-rc.4 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.0.0-rc.4 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.0.0-rc.5 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed
Git provider
GitLab
Container Registry provider
Harbor
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
According to this comment, the internal error message you're seeing is coming from the server, so it is most likely to be an upstream issue.
@devopstagon Did you manage to solve this issue, I have started seeing this error appear on my cluster coming from source-controller. Unsure why its having a problem.
We're experiencing the same since a couple of days on GitHub as source.
Flux retries when the connection fails, it’s not much we can do about if GitHub has connectivity issues. See https://www.githubstatus.com/incidents/r3x7x31k7nn1