harvester
harvester copied to clipboard
[BUG] registration of Harvester cluster in Rancher prime fails
Describe the bug
Registration process of a Harvester cluster does not complete successfully in Rancher Prime
To Reproduce
- Install Harvester cluster (e.g. v1.2.2-rc1)
- Install Rancher in a prime-only version (e.g. v2.7.12)
- Try to register Harvester cluster in Rancher
Expected behavior The registration process should complete without error and the Harvester cluster should be available in the Rancher UI for management.
Environment
- Harvester ISO version: v1.2.2-rc1
- Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): bare-metal
Additional context Log excerpt from the rancher-agent cleanup job:
ErrImagePull (rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/rancher/rancher-agent:v2.7.12": failed to resolve reference "docker.io/rancher/rancher-agent:v2.7.12": docker.io/rancher/rancher-agent:v2.7.12: not found)
The problem here seems to be a misconfiguration of Rancher's registration job as in case of a prime-only version of Rancher the image is not available from docker.io
but needs to be pulled from the rancher-prime registry.
cc @bk201 @FrankYang0529
I can't reproduce this. cattle-cluster-agent
pods run with prime images.
The cattle-cleanup
pod runs with a non-existent community image after deleting the Harvester cluster, it might be worth filing an issue in Rancher.
the manifest for agent is generated via this template: https://github.com/rancher/rancher/blob/release/v2.7/pkg/systemtemplate/template.go#L3
That seems to be generated from here: https://github.com/rancher/rancher/blob/release/v2.7/pkg/systemtemplate/import.go#L161
Might be useful to check what the cluster spec and also the setting is pointing to.
FYR, I tried to reproduce on my ecm machine but in vain.
-
Fresh install
harvester-v1.2.2-rc1
-
SSH to node-0, download
rancher-vcluster.yaml
, also a copyrancher-vcluster.yaml.orig
- Ref. https://docs.harvesterhci.io/v1.2/advanced/addons/rancher-vcluster/
-
Edit
rancher-vcluster.yaml
w/ rancher-prime values and apply -
Follow https://docs.harvesterhci.io/v1.2/advanced/addons/rancher-vcluster/ to enable rancher-vcluster
-
Import Harvester to Rancher
I've done some more debugging yesterday and ended up finding out that the image pull error and the failing cluster import are unrelated.
The image pull error originates from a cleanup job, which is spawned by the Rancher agent when the Harvester cluster's cluster-registration-url
is set to empty.
The cluster import failing is a bug in Rancher's handling of user-input. If a user configures Rancher with a server URL that has a trailing /
character and Rancher is running behind an Nginx ingress proxy, the import process fails due to the Rancher agent trying to connect with a wrong API path.
I've filed two separate bugs against Rancher:
https://github.com/rancher/rancher/issues/45404
https://github.com/rancher/rancher/issues/45403
Feel free to close this issue, as I believe there isn't anything more to be done for the Harvester team here.
Thanks @m-ildefons, closing this as per @m-ildefons debugging. Also, we have seen passed test result with Rancher prime.