rancher icon indicating copy to clipboard operation
rancher copied to clipboard

[Forwardport] air gap RKE2 downstream cluster fails to pull images if the registry mirrors endpoint does not contain a schema

Open Sahota1225 opened this issue 1 year ago • 1 comments

Forwardport : https://github.com/rancher/rancher/issues/46090

Rancher Server Setup

Rancher version: v2.8.3 Installation option (Docker install/Helm Chart): If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): Proxy/Cert Details: Information about the Cluster

Kubernetes version: v1.28.9-rke2r1 Cluster Type (Local/Downstream): Downstream If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): User Information

What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom) If custom, define the set of permissions: Describe the bug

When provisioning an air-gapped RKE2 cluster, if the user sets the mirror endpoint to their private registry, but without the https:// scheme, the cluster will fail to provision with the following error message observed in the rancher-system-agent logs:

Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=info msg="Pulling image registry.rancher.com/rancher/system-agent-installer-rke2:v1.28.9-rke2r1" Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=warning msg="Ignoring relative endpoint URL for registry registry.rancher.com: "registry123.sample.com"" Jun 17 13:23:56 qchcon201 rancher-system-agent[20349]: time="2024-06-17T13:23:56+02:00" level=warning msg="Failed to get image from endpoint: Get "https://registry.rancher.com/v2/": dial TCP :443: connect: connection refused" To Reproduce

Create a DS RKE2 cluster and configure the registries similar to the following,

registries:
  configs: {}
  mirrors:
    registry.rancher.com:
      endpoint:
        - registry123.sample.com

Result

The cluster provision fails, and we can find the warning and error message in rancher-system-agent as described above.

Expected Result

The cluster provision should succeed, the cluster should be able to pull images from the mirror.

Screenshots

Additional context

The most recent rancher-system-agent release (v0.3.6) uses wharfie v0.6.2 as the image pulling library. wharfie v0.6.2 still requires an absolute mirror endpoint URL including the scheme (https://github.com/rancher/wharfie/blob/v0.6.2/pkg/registries/registries.go#L167). The most recent release of wharfie (v0.6.6) drops this requirement, supporting endpoints without a scheme (https://github.com/rancher/wharfie/commit/506419fe098e1da1aa44e5ce5e99e608ce318c21). A PR to update wharfie in the system-agent remains open at https://github.com/rancher/system-agent/pull/143 SURE-8588

Sahota1225 avatar Aug 01 '24 04:08 Sahota1225

PR was merged, now wait for new rancher/rancher image

jiaqiluo avatar Aug 02 '24 19:08 jiaqiluo

This issue needs an RC/alpha in order to properly test.

markusewalker avatar Aug 21 '24 15:08 markusewalker

Validated that this is addressed on v2.10-alpha2. See details below:

ENVIRONMENT DETAILS

  • Rancher install: Helm
  • Rancher version: v2.10-alpha2

TEST RESULT

# Scenario Result
1 Provision RKE2 air-gap downstream cluster with https:// schema in mirror endpoint :white_check_mark:
2 Provision RKE2 air-gap downstream cluster without https:// schema in mirror endpoint :white_check_mark:

VALIDATION STEPS

Scenario 1

  1. Provisioned downstream RKE2 v1.30.5+rke2r1 node driver cluster (1 etcd, 1 cp, 1 worker).
    • Ensured that the private registry had https:// schema present in the endpoint.
  2. Validated that the cluster and nodes came up Active.

Scenario 2

  1. Repeated scenario 1, but did not use https:// schema in the registry endpoint.

markusewalker avatar Oct 03 '24 16:10 markusewalker