fleet
fleet copied to clipboard
Unable to use `lookup` in Fleet-managed Helm template
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
If I create a Fleet Bundle with an embedded Helm chart, and one of the templates in that Helm chart uses Helm's lookup feature, it does not work. The lookup function returns nothing, and the template does not render as expected.
Expected Behavior
Fleet Bundles with Helm charts that use lookup should function as expected.
Steps To Reproduce
Create a simple Helm bundle:
$ cat Chart.yaml
apiVersion: v2
name: lookup-test
version: 1.0.0
$ cat fleet.yaml
helm:
chart: .
$ cat templates/config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-test
namespace: kube-system
data:
lookup-output: {{ len (lookup "v1" "Pods" "kube-system" "").items }}
When deployed to a cluster, the expected ConfigMap should have a number (the count of Pods in kube-system) as the value in lookup-output. But instead we get an error from Fleet indicating that it was unable to parse the template:
message: 'ErrApplied(10) [Bundle lookup-test-test: template: test/templates/config.yaml:7:20:
executing "test/templates/config.yaml" at <len (lookup "v1" "Pods"
"kube-system" "").items>: error calling len: len of nil pointer]'
The fact that it complains of a nil pointer is important -- that means that the lookup function either returned an empty dict, or a nil pointer itself. If it had executed properly, even if there were no resources matching the query, items would have been an empty list, and len would have returned 0.
Environment
- Architecture: amd64
- Fleet Version: 0.7.0
- Cluster:
- Provider: RKE
- Options:
- Kubernetes Version: 1.26
Logs
No response
Anything else?
No response
I have the same problem, Is there any solution atm ?
I have not seen any solution or workaround for this.
Has same problem here. Rancher: 2.7.9 Fleet: 0.8.1
I have managed to get some forms of lookup to work. For example, this works for me in a template that is rendered by Fleet:
{{- range $ns := (lookup "v1" "Namespace" "" "").items -}}
{{- $prjId := dig "metadata" "annotations" "field.cattle.io/projectId" "" $ns }}
{{- if has $prjId $excludedProjectIds -}}
{{- $excludedNamespaces = append $excludedNamespaces $ns.metadata.name -}}
{{- end -}}
{{- end -}}
And this one works too:
#{{- if gt (len (lookup "rbac.authorization.k8s.io/v1" "ClusterRole" "" "")) 0 -}}
...
...
#{{- end -}}
So it's not a simple case of "lookup doesn't work" -- it's something more subtle than that. The commonality in the two examples above that I know are working, is that the last two arguments to lookup are empty strings, while the one that doesn't work (in the ticket desccription) has the third argument defined. Maybe it's got something to do with using lookup to only return resources in a certain namespace?
Debugging lookup statements in Helm is horrifically tricky though, because you can't use helm template to render the chart to debug it. helm template replaces lookup with an inert function, so it always returns an empty dict.
I think I found some possible explanation here:
When fleet controller creating bundle, the helm chart is being prerendered, but because it is not based on the target cluster, the Kubernetes API lookup will return empty. So if the template doesn't handle nil / empty result, it will throw out error at templating stage.
I think I found some possible explanation here:
When fleet controller creating bundle, the helm chart is being prerendered, but because it is not based on the target cluster, the Kubernetes API lookup will return empty. So if the template doesn't handle nil / empty result, it will throw out error at templating stage.
Here is an example actually shows some clue:
The following template will create a good-secret secret object if lookup succeeds, and failed-secret if it fails.
apiVersion: v1
kind: Secret
{{- $secret := (lookup "v1" "Secret" "default" "system-secret") }}
{{- if $secret }}
metadata:
name: good-secret
data:
pass: {{ index $secret "data" "secretContent" }}
{{- else }}
metadata:
name: failed-secret
data:
error: "Pre-provisioned system-secret was not found in the cluster."
{{- end }}
In the bundle view, the result fails:
But in the target cluster, the deployment succeeds, indicating a successful lookup execution:
So, don't throw exception / error (for example, using fail function in helm templating) when being used in fleet at the moment.
An error in the "helm template" code on the local/upstream cluster should no longer stop the deployment: https://github.com/rancher/fleet/issues/1101
However, when lookup returns nil, that might be an issue afterwards. We have something similar here: https://github.com/rancher/fleet/issues/1700
Are you able to get more information about where fleet fails? E.g. by
- looking at the "fleet apply" job output with a tool like stern:
stern -n cattle-fleet-system -l "app=fleet-job" - looking at the status of several resources, like the gitjob, gitrepo, bundle, bundledeployment? I would expect the bundle to have the best information about where the fleet code failed.
- Anything in the fleet-controller logs, maybe when debug is enabled?
kubectl logs -n cattle-fleet-system -l "app=fleet-controller" -f - Is this also happening on v0.9?
An error in the "helm template" code on the local/upstream cluster should no longer stop the deployment: #1101
However, when lookup returns nil, that might be an issue afterwards. We have something similar here: #1700
Are you able to get more information about where fleet fails? E.g. by
- looking at the "fleet apply" job output with a tool like stern:
stern -n cattle-fleet-system -l "app=fleet-job"- looking at the status of several resources, like the gitjob, gitrepo, bundle, bundledeployment? I would expect the bundle to have the best information about where the fleet code failed.
- Anything in the fleet-controller logs, maybe when debug is enabled?
kubectl logs -n cattle-fleet-system -l "app=fleet-controller" -f- Is this also happening on v0.9?
In summary, in my case, we intentionally fail the helm installation when pre-condition does not met. We would love this mechanism also be available when deploy from fleet, so that we can identify the problem early in the deployment stage.
Further explanation: @manno , i have included a fail function call in my original helm chart:
apiVersion: v1
kind: Secret # Maxwell user secret
metadata:
name: local-data
data:
{{- $secret := (lookup "v1" "Secret" "default" "local-data") }}
{{- if $secret }}
key: {{ index $secret "data" "key" }}
{{- else }}
{{ fail "Pre-provisioned local-data was not found in the cluster." }}
{{- end }}
The goal was to make sure the helm chart can only be deployed in a cluster meeting our shared environment. The use of fail was perfect when doing normal deployed like in Rancher → Apps or in CLI, but it causes problem when using fleet instead with following error in GitRepo view:
Execution error at (test/templates/secret.yaml:10:5): Pre-provisioned local-data was not found in the cluster.
Log from fleet-agent in target cluster:
handler bundle-deploy: execution error at (test/templates/secret.yaml:10:5): Pre-provisioned local-data was not found in the cluster., requeuing
Log from fleet-controller in Rancher cluster - local
While calculating status.ResourceKey, error running helm template for bundle all-helm-test with target options from : execution error at (test/templates/secret.yaml:10:5): Pre-provisioned local-data was not found in the cluster.
We are not able to test it in Rancher 2.8.1 as we are using azure devops for git repository, and there is a pending issue in fleet 0.9 accessing azure devops.
@manno ,
We've upgraded to Rancher 2.8.3 / fleet 0.9.2, the error still exists.
With the following yaml:
apiVersion: v1
kind: ConfigMap # Maxwell user secret
metadata:
name: local-data
data:
{{- $data := (lookup "v1" "ConfigMap" "default" "local-data") }}
{{- if $data }}
foo: {{ index $data "data" "foo" }}
{{- else }}
foo: {{ .Capabilities.KubeVersion.Version }}
{{ fail .Capabilities.KubeVersion.Version }}
{{- end }}
We are able to see the error message from fleet.agent at downstream cluster:
time="2024-04-01T02:51:43Z" level=info msg="getting history for release exp-bad-fleet"
time="2024-04-01T02:51:43Z" level=error msg="error syncing 'cluster-fleet-default-c-xxxxx-814acbc24fcf/exp-bad-fleet': handler bundle-deploy: execution error at (bad-chart/templates/cofnigMap.yaml:11:5): v1.25.16+k3s4, requeuing"
2024-04-01T02:51:43.182794473Z time="2024-04-01T02:51:43Z" level=info msg="Deploying bundle cluster-fleet-default-c-xxxxx-814acbc24fcf/exp-bad-fleet"
time="2024-04-01T02:51:43Z" level=info msg="preparing upgrade for exp-bad-fleet"
time="2024-04-01T02:51:43Z" level=info msg="getting history for release exp-bad-fleet"
time="2024-04-01T02:51:43Z" level=error msg="error syncing 'cluster-fleet-default-c-xxxxx-814acbc24fcf/exp-bad-fleet': handler bundle-deploy: execution error at (bad-chart/templates/cofnigMap.yaml:11:5): v1.25.16+k3s4, requeuing"
From the log, we can see that the k3s version (v1.25.16+k3s4) of the downstream cluster has been correctly acquired by the helm dryrun, as shown as part of error message output. Does it mean the dryrun / templating was performed in the fleet-agent at downstream cluster side?
If this is the case, can fleet-agent perform the dryRun in server mode? https://helm.sh/docs/chart_template_guide/functions_and_pipelines/#using-the-lookup-function
Keep in mind that Helm is not supposed to contact the Kubernetes API Server during a helm template|install|upgrade|delete|rollback --dry-run operation. To test lookup against a running cluster, helm template|install|upgrade|delete|rollback --dry-run=server should be used instead to allow cluster connection.
Looks like there is no configuration option DryRunOption in Install func
Tried to add it as u.DryRunOption = "server" and built dev version. Lookup function working good with it.
Hi, is there currently a way to make fleet use helms dry-run flag in server mode as described above? Charts depending on bitnami/common that offer the option to provide passwords via existing secret (like bitnami/postgresql or bitnami/mysql) seem to fail on upgrades via fleet currently (fleet 0.10.2).
I have the same problem.
The bitnami/postgresql deployment fails due to a lookup error, is there a solution?
If there is, I haven't found it. I just stopped using fleet for this helm chart
/backport v2.13.1