fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[SURE-8809] Fleet deployment fails when Helm chart repo uses custom CA / missing error

Open kkaempf opened this issue 1 year ago • 2 comments

SURE-8809

Issue description:

When adding a git repo to fleet that has a fleet.yaml referencing an external helm chart.

and

the server serving the helm chart uses a custom CA

Then

the Git Repo is added and marked with State "active"

the "Clusters Ready" status remains on "0/0" indefinitely

no error is thrown or visible in the Fleet UI

Expected behavior:

  1. Fleet should honor the custom CA configured in the Rancher global settings or the custom CA configured in the GitRepo resource (spec.CaBundle) when downloading resources, such as Helm charts.

  2. Fleet should display an error message in the UI, indicating that there was a problem downloading the Helm chart

Business impact:

Customer can't use any helm charts from internet sources, because their corporate firewall performs TLS inception, replacing all SSL certs with their own CA.

kkaempf avatar Aug 16 '24 15:08 kkaempf

We should add support to Fleet to fall back to a default value, unless overridden in the resource. This is most likely a Fleet install option, I think Fleet already gets re-installed when the Rancher CA changes.

However there are multiple clients in Fleet, which would need to support this. They all use a PEM block for CA in their spec instead. Specifying the CA in the resource directly has the advantage that we don't need to watch another resource for changes, e.g. to redeploy on certificate rotation. That's not possible, when we rely on a global setting. Does re-installing Fleet re-render all bundles with the new CA, or do we need to implement this?

  • git monitor (lsremote)
  • git cloner
  • chart downloader
  • image scan (no custom CA yet)

manno avatar Aug 20 '24 10:08 manno

I've faced the same issue.

marthydavid avatar Sep 10 '24 12:09 marthydavid

Additional QA

Problem

Fleet fails to deploy a workload pointing to a Helm registry using a custom CA.

Solution

Fleet takes Rancher-configured CA bundle secrets into account, enabling those to be used by default if a GitRepo does not provide a CA bundle. This happens in interactions with both git and Helm.

See new Fleet docs.

Testing

Engineering Testing

See description of this PR.

Manual Testing

N/A: manual testing ended up being automated.

Automated Testing

  • Unit tests for:
    • extracting CA bundle data from Rancher secrets, following the above linked Rancher docs on their expected names and structures
    • using them in the git latest commit fetcher
  • Integration tests for use of that data when creating jobs used to clone git repositories and to create bundles
  • End-to-end tests for deploying workloads using GitRepos pointing to a git server using a custom CA bundle

QA Testing Considerations

This should be tested with an actual Rancher install, having set up secrets as outlined in docs linked in the Solution paragraph above:

  • Is Fleet able to deploy a GitRepo pointing to a git repository using a custom CA bundle?

  • Is Fleet able to deploy a Helm chart from a GitRepo which monitored git repository points to a Helm registry using a custom CA bundle?

    • This requires:
      • setting up a Helm or OCI registry using a custom CA, then pushing a chart or artifacts to it
      • using a GitRepo referencing a git repository (which could live in the git server mentioned above, or on e.g. Github), with a fleet.yaml pointing to the Helm or OCI registry using the custom CA.
  • HelmOps is not covered by this feature; neither is ImageScan.

Regressions Considerations

If no Rancher-configured CA bundle secrets exist, is Fleet (within Rancher) still able to deploy a regular workflow...

  • ... from a git repository hosted on a server using a known CA (e.g. Github)?
  • ... with Helm charts themselves hosted in a registry also using a known CA?

weyfonk avatar Feb 18 '25 12:02 weyfonk

After initial testing with Rancher 2.11 and fleet 106.0.0-up0.12.0-rc.3, it seems working with local git-server for a simple-chart, but failed when testing it with oci hosted in chartmuseum.

To test I used dev scripts from https://github.com/rancher/fleet/tree/main/dev. to deploy local git server, deploy chart museum, pass custom ca secrets, push local commits and later install rancher on a multinode k3d cluster. (thanks @weyfonk for help setting this)

When trying to use this example for oci: e2e/assets/helm/repo/http-with-auth-repo-path/fleet.yaml, passing the credentials both for git and helm it finds an error:

Job Failed. failed: 1/1time="2025-03-19T12:16:02Z" level=fatal msg="failed to process bundle: failed to resolve URL of repo=https://chartmuseum-service.default.svc.cluster.local:8081 chart=sleeper-chart version=0.1.0: Get \"https://chartmuseum-service.default.svc.cluster.local:8081/index.yaml\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

Image

After an outstanding help of @weyfonk, he could also reproduce this on hist side and he is taking a look.

mmartin24 avatar Mar 19 '25 12:03 mmartin24

Verifed in Rancher v2.11-7e38c1ca2052319fa408bc5fde6651675765d0ae-head with Fleet fleet:106.0.0+up0.12.0-rc.4


Environment preparation

  • Prepare local git server.
  • Clone 2 private git-repos using custom ca
  • The first one is a simple chart, requiring login
  • The second one is a helm chart (sleeper chart from test) requiring helm auth. Chartmuseum is lifted for this purpose.
  • Ensure Rancher TLS Certificate Validation requires valid certificate

More detailed steps here


Test validation

  • Verify simple-chart with custom ca works

Image

  • Verified git repo with custom ca / private helm-chart custom works as well

Image

  • Verified when removing custom ca repos do not work

Image

Verified git repo with custom ca / private helm-chart custom works after force update and credential changes.

This is due to the error described above


mmartin24 avatar Mar 21 '25 09:03 mmartin24