[RFE] Option to provide data-dir for RKE2 provisioned by Rancher
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Add support for configuring the data directories of all provisioning v2 related components:
- RKE2/K3s data-dir
- CAPR var directory (where various planner related information is stored for provisioning)
- system-agent data (where plans are stored on disk)
Describe the solution you'd like A clear and concise description of what you want to happen.
Support configuring each data-directory on a per cluster level. These fields will only be configurable on cluster creation, and changing will be disabling via the webhook on update.
Additional context Add any other context or screenshots about the feature request here.
SURE-5886
UI Changes tracked here, which should be QA'd at same time as these changes https://github.com/rancher/dashboard/issues/10824
@susesgartner @jakefhyde do you have any information about systemAgentVarDir, caprVarDir and dataDir fields in terms of specs? (are they strings, arrays, objects, etc)? Do they have default values? Also, in which category of the cluster provisioning UI would they fit?
Adding a screenshot of the current UI for provisioning RKE2 clusters for context:
FYI @richard-cox
@jakefhyde FYI, UI work has been done and merged for this feature. Covered by unit tests. Assuming feature e2e testing will be done by backend QA.
Just an update from my end: I Have to wait for the May patches for RKE2/K3s to go out before I can start merging stuff, and won't be properly testable until the following June patches go out so got about a month window to merge stuff.
Regarding release-note label, we will need to not only release note the new feature but the known limitation (https://github.com/rancher/rancher/issues/46066) that we're planning to fix in the next RKE2 release.
Ticket #45038 - Test Results - ✅
| Scenario | Test Case | Result |
|---|---|---|
| 1. | Provision cluster with default data dirs | PASS |
| 2. | Provision cluster with different data dirs | PASS |
| 3. | Provision cluster with shared data dir. | PASS |
| 4. | Upgrade rancher cluster with CATTLE_AGENT_VAR_DIR set. | PASS |
| 5. | Snapshot restore with default data dirs | PASS |
| 6. | Snapshot restore with different data dirs | FAIL |
| 7. | Snapshot restore with shared dirs | FAIL |
Verified with (HA Helm or Docker) on Rancher v2.9-b573c957ab78a1121888c6b7d73d787efc479d7f-head:
Scenario 1: Provision cluster with default data dirs
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster
- Wait for the cluster to finish provisioning.
- Create a test container on the cluster and verify it comes up active
Results: The cluster provisions successfully and all pods are in a good state.
Scenario 2: Provision cluster with different data dirs
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster and a custom dir for System-agent, provisioning and k8s distro
- Wait for the cluster to finish provisioning.
- Create a test container on the cluster and verify it comes up active
Results: The cluster provisions successfully and all pods are in a good state.
Scenario 3: Provision cluster with shared data dir
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster and provide a shared directory in the cluster config
- Wait for the cluster to finish provisioning.
- Create a test container on the cluster and verify it comes up active
Results: The cluster provisions successfully and all pods are in a good state.
Scenario 4: Upgrade rancher cluster with CATTLE_AGENT_VAR_DIR set.
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a rancher cluster on 2.8-head
- Create a downstream 3 node split role cluster and provide the CATTLE_AGENT_VAR_DIR env var
- Wait for the cluster to finish provisioning.
- Create a test container on the cluster and verify it comes up active
- Take a snapshot
- Upgrade the cluster to 2.9-head
- Restore the snapshot
- Verify the cluster remains in a healthy state and the CATTLE_AGENT_VAR_DIR is no longer present on the cluster
Results: The cluster remained in a good state after the upgrade and snapshot.
Scenario 5: Snapshot restore with default data dirs
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster
- Wait for the cluster to finish provisioning.
- Take a snapshot
- Upgrade the k8s version
- Restore the snapshot
Results: The snapshot successfully restores and all pods remain in a healthy state
Scenario 6: Snapshot restore with different different data dirs
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster and a custom dir for System-agent, provisioning and k8s distro
- Wait for the cluster to finish provisioning.
- Take a snapshot
- Upgrade the k8s version
- Restore the snapshot
Results: The cluster hangs and the snapshot fails to restore
Scenario 7: Snapshot restore with shared data dirs
NOTE: Performed on RKE2 and K3s with custom and node driver clusters
- Create a 3 node split role cluster and provide a shared directory in the cluster config
- Wait for the cluster to finish provisioning.
- Take a snapshot
- Upgrade the k8s version
- Restore the snapshot
Results: The cluster hangs and the snapshot fails to restore
Additional notes: The failures have been release noted and an issue has been opened
Discussed offline. The issue found - https://github.com/rancher/rancher/issues/46066 will be fixed in the next release.