longhorn
longhorn copied to clipboard
[IMPROVEMENT] Check kernel module `dm_crypt` on host machines
Is your improvement request related to a feature? Please describe (👍 if you like this request)
- Add the necessary
dm_cryptkernel module for encrypted volumes in the official document - Add the necessary
dm_cryptkernel module check in thenode.status.conditions - Add the necessary
dm_cryptkernel module check in thelonghornctl
Describe the solution you'd like
Describe alternatives you've considered
Additional context
https://github.com/longhorn/longhorn/issues/9135
Tasks
- [ ] doc: The documentation for the necessary
dm_cryptkernel module for encrypted volumes - [ ] longhorn-manager:
dm_cryptkernel module check in thenode.status.conditions - [x] cli:
dm_cryptkernel module check
Pre Ready-For-Testing Checklist
- [ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
- CLI:
- The command
longhornctl check preflightcan check if the moduledm_cryptis loaded or not - The command
longhornctl install preflightcan load the moduledm_crypt
- Longhorn node:
- The node.Status.Condition
ModulesLoadedwill show whether the moduledm_cryptis loaded or not.
-
[ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at: https://github.com/longhorn/longhorn/pull/9374
-
[ ] Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including
backport-needed/*)? The PR is at cli: https://github.com/longhorn/cli/pull/66 longhorn-manager: https://github.com/longhorn/longhorn-manager/pull/3065 -
[ ] If labeled: require/doc Has the necessary document PR submitted or merged (including
backport-needed/*)? The documentation issue/PR is at https://github.com/longhorn/website/pull/995 https://github.com/longhorn/website/pull/1017
Let's backport this.
We need to figure out how to check the built-in kernel module. Let's move it to v1.7.2 instead.
Verified on v1.8.0-dev-20240922 20231001
- longhorn v1.8.0-dev-20240922 https://github.com/longhorn/longhorn/commit/ddfda37472d9b4d877c2a60663bb176ae3828825
- cli v1.8.0-dev-20240922 https://github.com/longhorn/cli/commits/v1.8.0-dev-20240922/
The test steps https://github.com/longhorn/longhorn/issues/9153#issuecomment-2285974275
Result
- Output when
dm_cryptis not loaded:
warn:
- multipathd.service is running. Please refer to https://longhorn.io/kb/troubleshooting-volume-with-multipath/ for more information.
ryao-demo-0926-w2-94f08d41-4tssf:
error:
- 'Module dm_crypt is not loaded: failed to execute: nsenter [--mount=/host/proc/1940366/ns/mnt --net=/host/proc/1940366/ns/net grep dm_crypt /proc/modules], output , stderr : exit status 1'
info:
- Service iscsid is running
- NFS4 is supported
- Package nfs-common is installed
- Package open-iscsi is installed
- Package cryptsetup is installed
Additionally, in the node's YAML:
- lastProbeTime: ""
lastTransitionTime: "2024-09-30T02:55:13Z"
message: Kernel modules [dm_crypt] are not loaded on node ryao-demo-0926-w3-57d499c4-2chfp
reason: KernelModulesNotLoaded
status: "False"
type: KernelModulesLoaded
After running longhornctl install preflight and executing longhornctl check preflight again, you can see that dm_crypt has been successfully loaded.
- Module dm_crypt is loaded
The node's YAML is updated as well:
- lastProbeTime: ""
lastTransitionTime: "2024-10-01T02:04:23Z"
message: Kernel modules [dm_crypt] are loaded on node ryao-demo-0926-w3-57d499c4-2chfp
reason: ""
status: "True"
type: KernelModulesLoaded
Reopen this issue because we also need to update the document on Root and Privileged Permission - Longhorn Manager about https://github.com/longhorn/longhorn/pull/9374.
cc @derekbit
Reopen this issue because we also need to update the document on Root and Privileged Permission - Longhorn Manager about #9374.
cc @derekbit
Close it due to https://github.com/longhorn/website/pull/1017 having been merged
on 1 of 3 workers and instance-manager recreating all time
node info
@AkakievKD I don't think this is related to the dm_crypt node condition. It just records the status and won't stop any instance-manager pod. Please provide a support bundle for troubleshooting.