harvester icon indicating copy to clipboard operation
harvester copied to clipboard

[Enhancement] Air-gap operation: Support using a private registry for Rancher agent image

Open janeczku opened this issue 2 years ago • 7 comments

Is your feature request related to a problem? Please describe. Harvester cluster is deployed in an air-gap network with a private container registry. When registering that cluster in Rancher, Harvester attempts (unsuccessfully) to download the cattle-cluster-agent image from Docker Hub.

Rancher allows specifying a "system-default-registry" to override, inter alia, the registry used to pull the agent images. But this is a global setting and applies to all Harvester clusters, which may not share a common registry.

Expected behavior

When registering a Harvester cluster in Rancher, the user may specify a private registry URL for pulling all required (agent) images.

janeczku avatar Apr 22 '22 13:04 janeczku

possible workaround. download/migrate the image with zarf. And assuming you have control over the CAs you can spoof the image/registry/domain-name for setup. https://github.com/defenseunicorns/zarf

deskpil0t avatar Apr 22 '22 19:04 deskpil0t

this issue seems duplicate with https://github.com/harvester/harvester/issues/2175? cc @FrankYang0529

guangbochen avatar May 25 '22 09:05 guangbochen

I think it's different. #2175 is about CA cannot be applied to the host. This issue is about setting a private registry for Harvester cluster. For example, users may use a Rancher to manage multiple Harvester clusters, but the private registry for each Harvester cluster may be different.

FrankYang0529 avatar May 25 '22 09:05 FrankYang0529

Got it thanks, then we will need a setting to allow users to config a private registry and upload a custom CA cert of the Harvester cluster.

guangbochen avatar May 25 '22 13:05 guangbochen

We have to ensure that this feature allows full configuration of the Harvester integrated RKE2 registries.yaml with

  1. registry mirror (multiple targets as contained supports registry fault tolerance / failover between registries provided)
  2. registry namesapce mapping (i.e. image stored as registry01.customer.domain:5000/production/docker.io/rancher/rancher-agent:v2.6.6)
  3. authentication (username and password)
  4. and certificate authority bundle management (one global bundle i.e. /etc/ssl/ca-bundle.pem or separate ones per registry)

I assume this should be similar to the way how Rancher allows to configure this during RKE2 deployment:

grafik

Martin-Weiss avatar Jul 01 '22 10:07 Martin-Weiss

Pre Ready-For-Testing Checklist

  • [ ] ~If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?~

  • [X] Where is the test steps documented? The test steps are at: https://github.com/harvester/tests/issues/476

  • [ ] ~Is there a workaround for the issue? If so, where is it documented?~

  • [X] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)? The PR is at: https://github.com/harvester/harvester/pull/2713

    • [X] Does the PR include the explanation for the fix or the feature? Yes

    • [X] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML/Chart change is at: https://github.com/harvester/harvester/pull/2713

  • [ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:

  • [ ] ~If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?~

  • [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • [ ] ~If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?~

Automation e2e test issue: harvester/tests#476

Verifying fixed on master-6e8e21b2-head (10/04).

Result

Scenario 1

  • [x] Update the Harvester containerd-registry setting with private registry, confirm can pull image and deploy nginx service correctly from private registry

    image image image

Scenario 2

  • [x] Change the Harvester containerd-registry to default value, the nginx 1.22 cannot be deployed since no private registry assigned.

image

Scenario 3

  • [x] Add back the the Harvester containerd-registry setting to private registry, confirm can pull image and deploy nginx service correctly from private registry

    image

Test Information

  • Test Environment: 1 node Harvester on local kvm machine
  • Harvester version: master-6e8e21b2-head (10/04)

Verify Steps

Environment Setup

Follow the environment preparation steps in https://github.com/harvester/tests/issues/476#issue-1355132277

  • copy certs/domain.crt content in myregistry VM and paste it to additional-ca setting. image

Test plan 1

  1. Update the Harvester containerd-registry setting to use private registry image image
{
"Mirrors": {
  "docker.io": {
    "Endpoints": [
      "https://myregistry.local:5000"
    ],
    "Rewrites": null
  }
},
"Configs": {
  "myregistry.local:5000": {
    "Auth": null,
    "TLS": {
      "CAFile": "",
      "CertFile": "",
      "KeyFile": "",
      "InsecureSkipVerify": false
    }
  }
},
"Auths": null
}
  1. Open K9s -> search -> secrets -> containerd -> y image
  2. Check content in cattle-system/harvester-containerd-registry secret is changed. image
  3. Search jobs -> containerd -> l -> tail
  4. Check there are new jobs to auto apply the new containerd-registry setting. image
  5. Apply following yaml in file
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: myregistry-nginx
    spec:
      selector:
        matchLabels:
          image: myregistry-nginx
      template:
        metadata:
          labels:
            image: myregistry-nginx
        spec:
          containers:
            - imagePullPolicy: Always
              image: myregistry.local:5000/nginx:latest
              name: nginx
    
    harvester-node-0:~ # kubectl apply -f deploy.yaml
    deployment.apps/myregistry-nginx created
    
    
  6. search deployments -> default namespace -> myregsistry-nginx -> describe -> shift + g
  7. Check the nginx can be deployed.

Test plan 2

  1. Click the default value to update the Harvester containerd-registry setting to default empty image
  2. In K9s
  3. Check content in cattle-system/harvester-containerd-registry secret is changed.
  4. Check there are new jobs to auto apply the new containerd-registry setting.
  5. Search secrets -> containerd -> e
  6. Clean registry.yaml data field
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  registries.yaml: ""
kind: Secret
metadata:
  creationTimestamp: "2022-10-04T08:51:47Z"
  name: harvester-containerd-registry
  namespace: cattle-system
  resourceVersion: "42088"
  uid: 8445547f-430a-474e-9783-dc6b16914eb7
type: Opaque
  1. Search -> deployments -> find nginx-1.2.2
  2. Check cannot pull nginx-1.2.2 image image

Test plan 3

  1. Add back the Harvester containerd-registry setting to use private registry again image
  2. Check content in cattle-system/harvester-containerd-registry secret is changed. image
  3. Check there are new jobs to auto apply the new containerd-registry setting. image
  4. Check nginx 1.22 deployment is automatically back to running. image

TachunLin avatar Oct 04 '22 11:10 TachunLin

There is an issue need to further check while we are verifying test plan 2:

  1. When we already have setup valid setting in Harvester containerd-registry, after clicking the Use default value and save.
  2. We found the status field of the containerd registry setting also been cleanup to empty in K9s
  3. Which would cause no new apply-sync-containerd jobs trigger to update the containerd setting

image

Since the original test plan 2 can be achieved by cleaning the status in registry.yaml data field.

I suggest to close this issue and track the new issue we found in https://github.com/harvester/harvester/issues/2873

TachunLin avatar Oct 05 '22 06:10 TachunLin