fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Chart from OCI helm repo fails to deploy in AKS cluster

Open kdelaney-oqc opened this issue 9 months ago • 2 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues

Current Behavior

Rancher: v2.11.0 Helm: v2.16.8-rancher2 fleet-agent: v0.12.0

We have a single gitrepo that is targeting 4 AKS clusters and 1 k3s. In our fleet.yaml, we use targetCustomizations to define the chart and values files for each cluster.

For our k3s cluster, we have specified an oci path in the 'chart' option, and for our AKS clusters we simply use the local checkout as the chart source. This is a sanitised version of our fleet config:

- name: aks-cluster1
  helm:
    chart: .
    releaseName: release1
    valuesFiles:
      - filepath.yaml
  clusterSelector:
    matchLabels:
      management.cattle.io/cluster-display-name: aks-cluster1

- name: k3s-cluster1
  helm:
    chart: oci://imagereg.azurecr.io/reponame/chartname
    version: 0.1.5
    releaseName: release1
    valuesFiles:
      - filepath.yaml
  clusterSelector:
    matchLabels:
      management.cattle.io/cluster-display-name: k3s-cluster1

The config above is applied successfully in all clusters - using the Rancher ui we can that the k3s cluster has a different version of the chart. We have configured Helm Authentication in the gitrepo settings, and in the container registry logs we can see the chart is successfully pulled using the credentials we provide in Rancher.

If we update the config for the aks cluster to match the k3s target - we see an error in the gitrepo as below. The clusters are not updated and the gitrepo remains in a red "Git Updating" status.

Job Failed. failed: 1/1time="2025-04-16T13:59:26Z" level=debug msg="hostDir: /etc/docker/certs.d/imagereg.azurecr.io" time="2025-04-16T13:59:26Z" level=debug msg="hostDir: /etc/docker/certs.d/imagereg.azurecr.io" time="2025-04-16T13:59:26Z" level=debug msg="attempting v2 login to registry endpoint https://imagereg.azurecr.io/v2/" time="2025-04-16T13:59:26Z" level=debug msg="attempting v2 login to registry endpoint https://imagereg.azurecr.io/v2/" time="2025-04-16T13:59:26Z" level=fatal msg="failed to process bundle: loading directory .chart/61e8f5710ce4317933af58f5a5084731a612e1463c5a00763913085d4783c482, charts/chartname: not logged in"

It looks as though authentication using the credentials supplied in the gitrepo settings is not being applied to both clusters.

Expected Behavior

Given that the clusters are managed by the same gitrepo and one cluster works well, the second cluster should also work and apply the chart pulled from the oci repository.

Steps To Reproduce

  1. Set up two clusters, one hosted in Azure, and one using k3s.
  2. Register both in Rancher
  3. Set up single gitrepo to manage both clusters.
  4. Use fleet.yaml to define an oci path to a helm repo chart

Environment

- Architecture: arm64
- Fleet Version: v0.12.0
- Cluster:
  - Provider: AKS & k3s
  - Options: 
  - Kubernetes Version: 1.30

Logs

From the gitjob pod:

time="2025-04-16T13:59:26Z" level=debug msg="hostDir: /etc/docker/certs.d/imagereg.azurecr.io"
time="2025-04-16T13:59:26Z" level=debug msg="hostDir: /etc/docker/certs.d/imagereg.azurecr.io"
time="2025-04-16T13:59:26Z" level=debug msg="attempting v2 login to registry endpoint https://imagereg.azurecr.io/v2/"
time="2025-04-16T13:59:26Z" level=debug msg="attempting v2 login to registry endpoint https://imagereg.azurecr.io/v2/"
time="2025-04-16T13:59:26Z" level=fatal msg="failed to process bundle: loading directory .chart/61e8f5710ce4317933af58f5a5084731a612e1463c5a00763913085d4783c482, charts/chartname: not logged in"

Anything else?

No response

kdelaney-oqc avatar Apr 16 '25 15:04 kdelaney-oqc

Hi, we have also faced the same issue with one of our repositories.

namespace: example
targetCustomizations:
- name: example_prd
  helm:
    chart: oci://ghcr.io/org/chartname
    version: "4.9.6"
    valuesFiles:
      - example.yaml
  clusterSelector:
    matchLabels:
      env: prd
      region: EU

time="2025-04-30T16:45:50Z" level=fatal msg="failed to process bundle: loading directory .chart/426bf4b18aaa9cf7aada2b01e8416919d4435fbf9274615decccc1cce79e7102, chartname: not logged in"

Rancher version: v2.11.0 Fleet version: v0.12.0

Seems to me the issue is with oci://

gersangreal avatar Apr 30 '25 17:04 gersangreal

We also faced the issue. fleet.yaml redacted:

helm:
  releaseName: external-dns-erf
  chart: oci://registry-1-docker-io.mynexus.io/bitnamicharts/external-dns
  version: "8.8.0"

gitrepo.yaml redacted:

spec:
  branch: main
  clientSecretName: gitrepo-auth-l6wm7
  helmSecretName: helm-repo
  paths:
  - tools
  repo: ssh://[email protected]:7999/myorg/fleet.git
  targets:
  - clusterSelector:
      matchLabels:
        env: dev
        provider.cattle.io: rke2
    name: rke2-dev

Worked with previous fleet version. with Rancher v2.11.1 and Fleet v0.12.2 it's not working anymore

marthydavid avatar May 07 '25 08:05 marthydavid

Closing as duplicate of #3915.

weyfonk avatar Nov 17 '25 11:11 weyfonk