harvester icon indicating copy to clipboard operation
harvester copied to clipboard

[BUG] registration of Harvester cluster in Rancher prime fails

Open m-ildefons opened this issue 9 months ago • 1 comments

Describe the bug

Registration process of a Harvester cluster does not complete successfully in Rancher Prime

To Reproduce

  1. Install Harvester cluster (e.g. v1.2.2-rc1)
  2. Install Rancher in a prime-only version (e.g. v2.7.12)
  3. Try to register Harvester cluster in Rancher

Expected behavior The registration process should complete without error and the Harvester cluster should be available in the Rancher UI for management.

Environment

  • Harvester ISO version: v1.2.2-rc1
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630): bare-metal

Additional context Log excerpt from the rancher-agent cleanup job:

 ErrImagePull (rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/rancher/rancher-agent:v2.7.12": failed to resolve reference "docker.io/rancher/rancher-agent:v2.7.12": docker.io/rancher/rancher-agent:v2.7.12: not found) 

The problem here seems to be a misconfiguration of Rancher's registration job as in case of a prime-only version of Rancher the image is not available from docker.io but needs to be pulled from the rancher-prime registry.

m-ildefons avatar May 02 '24 07:05 m-ildefons

cc @bk201 @FrankYang0529

innobead avatar May 02 '24 07:05 innobead

I can't reproduce this. cattle-cluster-agent pods run with prime images. The cattle-cleanup pod runs with a non-existent community image after deleting the Harvester cluster, it might be worth filing an issue in Rancher.

bk201 avatar May 06 '24 03:05 bk201

the manifest for agent is generated via this template: https://github.com/rancher/rancher/blob/release/v2.7/pkg/systemtemplate/template.go#L3

That seems to be generated from here: https://github.com/rancher/rancher/blob/release/v2.7/pkg/systemtemplate/import.go#L161

Might be useful to check what the cluster spec and also the setting is pointing to.

ibrokethecloud avatar May 06 '24 03:05 ibrokethecloud

FYR, I tried to reproduce on my ecm machine but in vain.

  1. Fresh install harvester-v1.2.2-rc1

    image

  2. SSH to node-0, download rancher-vcluster.yaml, also a copy rancher-vcluster.yaml.orig

    image

    • Ref. https://docs.harvesterhci.io/v1.2/advanced/addons/rancher-vcluster/
  3. Edit rancher-vcluster.yaml w/ rancher-prime values and apply

    image

  4. Follow https://docs.harvesterhci.io/v1.2/advanced/addons/rancher-vcluster/ to enable rancher-vcluster

    image

  5. Import Harvester to Rancher

    image image

    image

albinsun avatar May 06 '24 05:05 albinsun

I've done some more debugging yesterday and ended up finding out that the image pull error and the failing cluster import are unrelated. The image pull error originates from a cleanup job, which is spawned by the Rancher agent when the Harvester cluster's cluster-registration-url is set to empty. The cluster import failing is a bug in Rancher's handling of user-input. If a user configures Rancher with a server URL that has a trailing / character and Rancher is running behind an Nginx ingress proxy, the import process fails due to the Rancher agent trying to connect with a wrong API path. I've filed two separate bugs against Rancher: https://github.com/rancher/rancher/issues/45404 https://github.com/rancher/rancher/issues/45403 Feel free to close this issue, as I believe there isn't anything more to be done for the Harvester team here.

m-ildefons avatar May 07 '24 06:05 m-ildefons

Thanks @m-ildefons, closing this as per @m-ildefons debugging. Also, we have seen passed test result with Rancher prime.

khushboo-rancher avatar May 07 '24 22:05 khushboo-rancher