[SURE-8809] Fleet deployment fails when Helm chart repo uses custom CA / missing error
SURE-8809
Issue description:
When adding a git repo to fleet that has a fleet.yaml referencing an external helm chart.
and
the server serving the helm chart uses a custom CA
Then
the Git Repo is added and marked with State "active"
the "Clusters Ready" status remains on "0/0" indefinitely
no error is thrown or visible in the Fleet UI
Expected behavior:
-
Fleet should honor the custom CA configured in the Rancher global settings or the custom CA configured in the GitRepo resource (spec.CaBundle) when downloading resources, such as Helm charts.
-
Fleet should display an error message in the UI, indicating that there was a problem downloading the Helm chart
Business impact:
Customer can't use any helm charts from internet sources, because their corporate firewall performs TLS inception, replacing all SSL certs with their own CA.
We should add support to Fleet to fall back to a default value, unless overridden in the resource. This is most likely a Fleet install option, I think Fleet already gets re-installed when the Rancher CA changes.
However there are multiple clients in Fleet, which would need to support this. They all use a PEM block for CA in their spec instead. Specifying the CA in the resource directly has the advantage that we don't need to watch another resource for changes, e.g. to redeploy on certificate rotation. That's not possible, when we rely on a global setting. Does re-installing Fleet re-render all bundles with the new CA, or do we need to implement this?
- git monitor (lsremote)
- git cloner
- chart downloader
- image scan (no custom CA yet)
I've faced the same issue.
Additional QA
Problem
Fleet fails to deploy a workload pointing to a Helm registry using a custom CA.
Solution
Fleet takes Rancher-configured CA bundle secrets into account, enabling those to be used by default if a GitRepo does not provide a CA bundle. This happens in interactions with both git and Helm.
See new Fleet docs.
Testing
Engineering Testing
See description of this PR.
Manual Testing
N/A: manual testing ended up being automated.
Automated Testing
- Unit tests for:
- extracting CA bundle data from Rancher secrets, following the above linked Rancher docs on their expected names and structures
- using them in the git latest commit fetcher
- Integration tests for use of that data when creating jobs used to clone git repositories and to create bundles
- End-to-end tests for deploying workloads using
GitRepos pointing to a git server using a custom CA bundle
QA Testing Considerations
This should be tested with an actual Rancher install, having set up secrets as outlined in docs linked in the Solution paragraph above:
-
Is Fleet able to deploy a
GitRepopointing to a git repository using a custom CA bundle?- Prerequisites:
- creating a git server using a custom CA (e.g. Gitlab installed through Helm, or homemade git server from rancher/fleet
- creating a git repository on that server, for instance with simple contents from this directory
- creating a
GitReporeferencing that git repository, and checking that it is eventually ready
- Prerequisites:
-
Is Fleet able to deploy a Helm chart from a
GitRepowhich monitored git repository points to a Helm registry using a custom CA bundle?- This requires:
- setting up a Helm or OCI registry using a custom CA, then pushing a chart or artifacts to it
- using a
GitReporeferencing a git repository (which could live in the git server mentioned above, or on e.g. Github), with afleet.yamlpointing to the Helm or OCI registry using the custom CA.
- This requires:
-
HelmOps is not covered by this feature; neither is ImageScan.
Regressions Considerations
If no Rancher-configured CA bundle secrets exist, is Fleet (within Rancher) still able to deploy a regular workflow...
- ... from a git repository hosted on a server using a known CA (e.g. Github)?
- ... with Helm charts themselves hosted in a registry also using a known CA?
After initial testing with Rancher 2.11 and fleet 106.0.0-up0.12.0-rc.3, it seems working with local git-server for a simple-chart, but failed when testing it with oci hosted in chartmuseum.
To test I used dev scripts from https://github.com/rancher/fleet/tree/main/dev. to deploy local git server, deploy chart museum, pass custom ca secrets, push local commits and later install rancher on a multinode k3d cluster. (thanks @weyfonk for help setting this)
When trying to use this example for oci: e2e/assets/helm/repo/http-with-auth-repo-path/fleet.yaml, passing the credentials both for git and helm it finds an error:
Job Failed. failed: 1/1time="2025-03-19T12:16:02Z" level=fatal msg="failed to process bundle: failed to resolve URL of repo=https://chartmuseum-service.default.svc.cluster.local:8081 chart=sleeper-chart version=0.1.0: Get \"https://chartmuseum-service.default.svc.cluster.local:8081/index.yaml\": tls: failed to verify certificate: x509: certificate signed by unknown authority"
After an outstanding help of @weyfonk, he could also reproduce this on hist side and he is taking a look.
Verifed in Rancher v2.11-7e38c1ca2052319fa408bc5fde6651675765d0ae-head with Fleet fleet:106.0.0+up0.12.0-rc.4
Environment preparation
- Prepare local git server.
- Clone 2 private git-repos using custom ca
- The first one is a simple chart, requiring login
- The second one is a helm chart (sleeper chart from test) requiring helm auth. Chartmuseum is lifted for this purpose.
- Ensure Rancher TLS Certificate Validation requires valid certificate
More detailed steps here
Test validation
-
Verify simple-chart with custom ca works
-
Verified git repo with custom ca / private helm-chart custom works as well
-
Verified when removing custom ca repos do not work
Verified git repo with custom ca / private helm-chart custom works after force update and credential changes.
This is due to the error described above