[Forwardport] air gap RKE2 downstream cluster fails to pull images if the registry mirrors endpoint does not contain a schema
Forwardport : https://github.com/rancher/rancher/issues/46090
Rancher Server Setup
Rancher version: v2.8.3 Installation option (Docker install/Helm Chart): If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): Proxy/Cert Details: Information about the Cluster
Kubernetes version: v1.28.9-rke2r1 Cluster Type (Local/Downstream): Downstream If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): User Information
What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) If custom, define the set of permissions: Describe the bug
When provisioning an air-gapped RKE2 cluster, if the user sets the mirror endpoint to their private registry, but without the https:// scheme, the cluster will fail to provision with the following error message observed in the rancher-system-agent logs:
Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=info msg="Pulling image registry.rancher.com/rancher/system-agent-installer-rke2:v1.28.9-rke2r1"
Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=warning msg="Ignoring relative endpoint URL for registry registry.rancher.com: "registry123.sample.com""
Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=warning msg="Failed to get image from endpoint: Get "https://registry.rancher.com/v2/": dial TCP
Create a DS RKE2 cluster and configure the registries similar to the following,
registries:
configs: {}
mirrors:
registry.rancher.com:
endpoint:
- registry123.sample.com
Result
The cluster provision fails, and we can find the warning and error message in rancher-system-agent as described above.
Expected Result
The cluster provision should succeed, the cluster should be able to pull images from the mirror.
Screenshots
Additional context
The most recent rancher-system-agent release (v0.3.6) uses wharfie v0.6.2 as the image pulling library. wharfie v0.6.2 still requires an absolute mirror endpoint URL including the scheme (https://github.com/rancher/wharfie/blob/v0.6.2/pkg/registries/registries.go#L167). The most recent release of wharfie (v0.6.6) drops this requirement, supporting endpoints without a scheme (https://github.com/rancher/wharfie/commit/506419fe098e1da1aa44e5ce5e99e608ce318c21). A PR to update wharfie in the system-agent remains open at https://github.com/rancher/system-agent/pull/143 SURE-8588
PR was merged, now wait for new rancher/rancher image
This issue needs an RC/alpha in order to properly test.
Validated that this is addressed on v2.10-alpha2. See details below:
ENVIRONMENT DETAILS
- Rancher install: Helm
- Rancher version:
v2.10-alpha2
TEST RESULT
| # | Scenario | Result |
|---|---|---|
| 1 | Provision RKE2 air-gap downstream cluster with https:// schema in mirror endpoint | :white_check_mark: |
| 2 | Provision RKE2 air-gap downstream cluster without https:// schema in mirror endpoint | :white_check_mark: |
VALIDATION STEPS
Scenario 1
- Provisioned downstream RKE2
v1.30.5+rke2r1node driver cluster (1 etcd, 1 cp, 1 worker).- Ensured that the private registry had
https://schema present in the endpoint.
- Ensured that the private registry had
- Validated that the cluster and nodes came up
Active.
Scenario 2
- Repeated scenario 1, but did not use
https://schema in the registry endpoint.