[SURE-8882] extend our testing to ssh helm chart downloads with keys
Acceptance Criteria
- [ ] write a test for Helm chart downloads via SSH as in https://github.com/rancher/fleet/blob/43cf0a41330c57c3d1b853e00ab66ab2c1899d6a/internal/bundlereader/loaddirectory.go#L242C2-L270C3
SURE-8882
Issue description:
The customer upgraded from Rancher 2.8.2 to Rancher 2.8.5 and some of their upstream fleet jobs are getting this error:
time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
Troubleshooting steps:
The customer tried changing the credentials and still get the same error. They are able to clone the repository locally using the same credentials supplied to Fleet. This also happens on most of the configured repositories, not just one or two git repos. They are able to exec into the gitjob pod and manually clone the repo with success. Checked from inside the GitJob pod:
> kubectl exec -n cattle-fleet-system gitjob-7889c69f49-5kq8r -it -- cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
sshd:x:499:486:SSH daemon:/var/lib/sshd:/usr/sbin/nologin
gitjob:x:1000:1000::/home/gitjob:/bin/bash
I reviewed two of their GitRepo manifests (working/not-working) and they are literally pointing to the same repo, the only difference is the path used. The customer dowgraded from 0.9.5 to 0.9.0 and the problem repos started to sync again
Repro steps:
unable to repro in-house
Workaround:
Is a workaround available and implemented? yes/no What is the workaround: Downgrade fleet
Actual behavior:
After upgrade, some gitrepos fail with error:
time=2024-07-30 15:16:18.000000 level=fatal msg="error downloading 'ssh://[email protected]/xxxxx/fleet-platform.git?sshkey=redacted': /usr/bin/git exited with 128: Cloning into '/tmp/getter624252719/temp'...\nNo user exists for uid 1000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n"
Expected behavior: All gitrepos continue to sync with no error
Files, logs, traces:
Additional notes:
We're facing the same issue here. SSH doesn't work at all when pulling helm charts. It might be related to https://github.com/rancher/fleet/commit/326ad93d73c70a48b4ed2e1bee4fe955d83491f4, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.
Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs fleet gitcloner and exits with success). It only seems to be an issue when running fleet apply.
We're using Rancher 2.9.3 with fleet 0.10.4
We're facing the same issue here. SSH doesn't work at all when pulling helm charts. It might be related to 326ad93, a quick Google search suggests that OpenSSH depends on the entry existing at /etc/passwd for some obscure reason.
Amusingly, using SSH for the GitRepo itself does work (the initcontainer runs
fleet gitclonerand exits with success). It only seems to be an issue when runningfleet apply.
If downloading a helm chart via ssh fails, that would be a separate issue. The fleet apply CLI uses go-getter to download charts.
Cloning the git repository is done by fleet gitcloner and it uses go-git.
I looked up the source code for go-getter, and its support for fetching from Git repos indeed relies on running git commands directly, which would trigger the OpenSSH error related to the missing /etc/passwd entry.
From what I could gather, go-git has a Git implementation of its own and it uses crypto/ssh for transport instead of OpenSSH - I'm pretty sure that it does not perform this validation, that seems to be the culprit for this weird requirement.
That's probably why this problem happens when fleet downloads a chart through Git+SSH but not when it fetches a GitRepo using Git+SSH - in our case, even when using the same repo and same credentials.
(I apologize if any of my messages comes out as confusing or maybe rude, as English is not my native language)
Hello, any update on this issue?? We are getting the same error in Fleet version v0.10.4
/backport v2.11.4
QA Template
Reproducing the issue
- Create a private git repository that can be accessed using an SSH key and can contain any helm chart to deploy.
- Create a secret that contains the key to access this repository.
In this example, we've created an SSH key namedkubectl -n fleet-local create secret generic ssh --type "kubernetes.io/ssh-auth" \ --from-file=ssh-privatekey=./../../keys/deployssh. - Create a public git repository that contains a
fleet.yamlfile. Thefleet.yamlhas to reference the private git repository like so:
wherehelm chart: git::ssh://[email protected]/p-se/fleet-devel.git//issues/SURE-8882/foo?ref=master//issues/SURE-8882/foo?ref=masteris optional and provides a path inside the git repository and the branch to be used to clone from. - Have the GitRepo resource point to the public git repository with the
fleet.yamlfile. For example:kind: GitRepo apiVersion: fleet.cattle.io/v1alpha1 metadata: name: sure-8882 namespace: fleet-local spec: repo: [email protected]:p-se/fleet-devel.git branch: master paths: - issues/SURE-8882/repro clientSecretName: ssh - Apply the GitRepo resource onto the cluster.
- You should see the
No user exists for uid 1000message in the status of the GitRepo.
Notes
The public repository can also be private, but doesn't have to. By using two private repositories and pointing the resource to the appropriate paths, you can use a single git repository for testing. That's where the provided examples come from.
Verified on Rancher v2.12-6d2a9d53b44f309eac233e0f21fd9d5e806b056d-head with Fleet 0.13.0-beta.3
Added test case on Fleet case: https://app.qase.io/case/FLEET-184
Reproduced issue before fix:
Testing steps:
- Set 2 repos, 1 public with ssh secret to be used pointing a private one and 1 private with the actual chart
Public with : https://github.com/fleetqa/fleet-qa-examples-public
kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
name: sure-8882
namespace: fleet-local
spec:
repo: [email protected]:fleetqa/fleet-qa-examples-public.git
branch: main
paths:
- sure-8882/reproducer
clientSecretName: ssh
and fleet.yaml pointitng to the ssh chart:
helm:
chart: git::ssh://[email protected]/fleetqa/fleet-qa-examples.git//helmchart-configmap?ref=main
Private repo with the actual chart (configmap chart): https://github.com/fleetqa/fleet-qa-examples/tree/main/helmchart-configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: foo
namespace: default
data:
key: value
apiVersion: v2
name: foo
description: A Helm chart for Kubernetes
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "1.16.0"
-
Create Deploy Key on private repository
-
Upload secret via UI (passing deploy key as ssh)
apiVersion: v1
data:
ssh-privatekey: " add private secret"
kind: Secret
metadata:
creationTimestamp: null
name: ssh
namespace: fleet-local
type: kubernetes.io/ssh-auth
- Add repo:
https://github.com/fleetqa/fleet-qa-examples-public.git main sure-8882/reproducer helm ssh key: ssh
- Gitrepo is deployed:
Does this work for a private repo for the GitRepo as well as the repo path for the helm chart in fleet.yaml? I was getting the uid 1000 on Rancher 2.11.3. Now retesting on Rancher 2.11.4, following the above QA, I'm getting the error:
Bcrypt_pbkdf: empty password:Job Failed. failed: 1/1time="2025-08-04T19:42:42Z" level=fatal msg="failed to create auth from options for repo=\"[email protected]:myorg/myrepo.git\" branch=\"main\" revision=\"\" path=\"/workspace\": bcrypt_pbkdf: empty password"
Scratch that, it's working. Thanks!
In case others stumble on this, a reminder:
- ssh keys must be generated in PEM format.
- ssh keys must NOT use a passphrase
- ssh keys in PEM do NOT support ed25519 as recommended by github
ssh-keygen -t rsa -b 4096 -m pem -C "[email protected]"- leave passphrase empty
- create kubernetes secret from private key
kubectl -n fleet-local create secret generic ssh --type "kubernetes.io/ssh-auth" \
--from-file=ssh-privatekey=path/to/privatekey
cat path/to/publickey- add to github.
Thanks @krumware - do we need to fix our documentation ? 🤔
I think maybe doing a passthrough on that portion of the docs would be helpful. (and maybe it's there elsewhere, but most troubleshooting and searching took me straight to the fleet and gitrepo reference pages.)
My struggles were:
- Finding a clear example of a workflow where both a gitrepo.yaml and a fleet.yaml are authenticated. The documentation tends to lean on at least one of these being unauthenticated. For gitops newbies it creates a challenge when internal repos will most likely always be private.
- The (necessary) re-use of the word "repository" creates confusion because the authentication for a git repository and a helm repository are different. I wish there was a way to more obviously separate the two.
- It was challenging to discover that the fleet yaml's authentication method is inherited from the method used by the gitrepo yaml
- It's still unclear whether different authentication mechanisms can be used for a gitrepo yaml versus a fleet yaml. (it seems, for example, that a user would need to use the same ssh key for potential physically separated repository servers. - an area where enhancement might be coming with the upstream secrets support)
- Despite warning sections indicating that certain legacy algorithms are required, and that passwords aren't supported, those notes didn't feel cohesive with the rest of the documentation and were not obvious.
- It was pointed out in the Rancher slack (thank you!) about the go-getter urls being used. But that really did not stand out in the documentation which made things feel a little bit too automagic, so there was lots of experimentation needed to get to a working configuration.
- The error messages in the Rancher CI UI were not found in any troubleshooting documentation. Such as the
bcrypt_pbkdf empty password. I was using a password-protected key, but could not find any way to pass that in, so it embarrassingly took me way too long to find the remark about passwords not being supported. - Finally, documentation on Github recommends against using deprecated algorithms which are required for fleet (at this point). So that unforunately adds to the complexity for adopters, who then need to troubleshoot why their generated key for a private github repository doesn't work.
I hope that's helpful!
@kakabisht 👆🏻