rancher icon indicating copy to clipboard operation
rancher copied to clipboard

[RFE] Option to provide data-dir for RKE2 provisioned by Rancher

Open snasovich opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Add support for configuring the data directories of all provisioning v2 related components:

  • RKE2/K3s data-dir
  • CAPR var directory (where various planner related information is stored for provisioning)
  • system-agent data (where plans are stored on disk)

Describe the solution you'd like A clear and concise description of what you want to happen.

Support configuring each data-directory on a per cluster level. These fields will only be configurable on cluster creation, and changing will be disabling via the webhook on update.

Additional context Add any other context or screenshots about the feature request here.

SURE-5886

snasovich avatar Apr 08 '24 15:04 snasovich

UI Changes tracked here, which should be QA'd at same time as these changes https://github.com/rancher/dashboard/issues/10824

richard-cox avatar Apr 17 '24 08:04 richard-cox

@susesgartner @jakefhyde do you have any information about systemAgentVarDir, caprVarDir and dataDir fields in terms of specs? (are they strings, arrays, objects, etc)? Do they have default values? Also, in which category of the cluster provisioning UI would they fit?

Adding a screenshot of the current UI for provisioning RKE2 clusters for context: Screenshot 2024-05-13 at 12 31 36

FYI @richard-cox

aalves08 avatar May 13 '24 11:05 aalves08

@jakefhyde FYI, UI work has been done and merged for this feature. Covered by unit tests. Assuming feature e2e testing will be done by backend QA.

nwmac avatar May 28 '24 08:05 nwmac

Just an update from my end: I Have to wait for the May patches for RKE2/K3s to go out before I can start merging stuff, and won't be properly testable until the following June patches go out so got about a month window to merge stuff.

jakefhyde avatar May 28 '24 15:05 jakefhyde

Regarding release-note label, we will need to not only release note the new feature but the known limitation (https://github.com/rancher/rancher/issues/46066) that we're planning to fix in the next RKE2 release.

snasovich avatar Jul 10 '24 16:07 snasovich

Ticket #45038 - Test Results - ✅

Scenario Test Case Result
1. Provision cluster with default data dirs PASS
2. Provision cluster with different data dirs PASS
3. Provision cluster with shared data dir. PASS
4. Upgrade rancher cluster with CATTLE_AGENT_VAR_DIR set. PASS
5. Snapshot restore with default data dirs PASS
6. Snapshot restore with different data dirs FAIL
7. Snapshot restore with shared dirs FAIL

Verified with (HA Helm or Docker) on Rancher v2.9-b573c957ab78a1121888c6b7d73d787efc479d7f-head:

Scenario 1: Provision cluster with default data dirs

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster
  2. Wait for the cluster to finish provisioning.
  3. Create a test container on the cluster and verify it comes up active

Results: The cluster provisions successfully and all pods are in a good state.


Scenario 2: Provision cluster with different data dirs

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster and a custom dir for System-agent, provisioning and k8s distro
  2. Wait for the cluster to finish provisioning.
  3. Create a test container on the cluster and verify it comes up active

Results: The cluster provisions successfully and all pods are in a good state.


Scenario 3: Provision cluster with shared data dir

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster and provide a shared directory in the cluster config
  2. Wait for the cluster to finish provisioning.
  3. Create a test container on the cluster and verify it comes up active

Results: The cluster provisions successfully and all pods are in a good state.


Scenario 4: Upgrade rancher cluster with CATTLE_AGENT_VAR_DIR set.

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a rancher cluster on 2.8-head
  2. Create a downstream 3 node split role cluster and provide the CATTLE_AGENT_VAR_DIR env var
  3. Wait for the cluster to finish provisioning.
  4. Create a test container on the cluster and verify it comes up active
  5. Take a snapshot
  6. Upgrade the cluster to 2.9-head
  7. Restore the snapshot
  8. Verify the cluster remains in a healthy state and the CATTLE_AGENT_VAR_DIR is no longer present on the cluster

Results: The cluster remained in a good state after the upgrade and snapshot.


Scenario 5: Snapshot restore with default data dirs

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster
  2. Wait for the cluster to finish provisioning.
  3. Take a snapshot
  4. Upgrade the k8s version
  5. Restore the snapshot

Results: The snapshot successfully restores and all pods remain in a healthy state


Scenario 6: Snapshot restore with different different data dirs

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster and a custom dir for System-agent, provisioning and k8s distro
  2. Wait for the cluster to finish provisioning.
  3. Take a snapshot
  4. Upgrade the k8s version
  5. Restore the snapshot

Results: The cluster hangs and the snapshot fails to restore


Scenario 7: Snapshot restore with shared data dirs

NOTE: Performed on RKE2 and K3s with custom and node driver clusters

  1. Create a 3 node split role cluster and provide a shared directory in the cluster config
  2. Wait for the cluster to finish provisioning.
  3. Take a snapshot
  4. Upgrade the k8s version
  5. Restore the snapshot

Results: The cluster hangs and the snapshot fails to restore


Additional notes: The failures have been release noted and an issue has been opened

susesgartner avatar Jul 11 '24 03:07 susesgartner

Discussed offline. The issue found - https://github.com/rancher/rancher/issues/46066 will be fixed in the next release.

sowmyav27 avatar Jul 12 '24 00:07 sowmyav27