longhorn icon indicating copy to clipboard operation
longhorn copied to clipboard

[IMPROVEMENT] Check kernel module `dm_crypt` on host machines

Open derekbit opened this issue 1 year ago • 3 comments

Is your improvement request related to a feature? Please describe (👍 if you like this request)

  1. Add the necessary dm_crypt kernel module for encrypted volumes in the official document
  2. Add the necessary dm_crypt kernel module check in the node.status.conditions
  3. Add the necessary dm_crypt kernel module check in the longhornctl

Describe the solution you'd like

Describe alternatives you've considered

Additional context

https://github.com/longhorn/longhorn/issues/9135

Tasks

  • [ ] doc: The documentation for the necessary dm_crypt kernel module for encrypted volumes
  • [ ] longhorn-manager: dm_crypt kernel module check in the node.status.conditions
  • [x] cli: dm_crypt kernel module check

derekbit avatar Aug 01 '24 15:08 derekbit

Pre Ready-For-Testing Checklist

  • [ ] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:
  • CLI:
  1. The command longhornctl check preflight can check if the module dm_crypt is loaded or not
  2. The command longhornctl install preflight can load the module dm_crypt
  • Longhorn node:
  1. The node.Status.Condition ModulesLoaded will show whether the module dm_crypt is loaded or not.
  • [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: The PR for the chart change is at: https://github.com/longhorn/longhorn/pull/9374

  • [ ] Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)? The PR is at cli: https://github.com/longhorn/cli/pull/66 longhorn-manager: https://github.com/longhorn/longhorn-manager/pull/3065

  • [ ] If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)? The documentation issue/PR is at https://github.com/longhorn/website/pull/995 https://github.com/longhorn/website/pull/1017

longhorn-io-github-bot avatar Aug 13 '24 11:08 longhorn-io-github-bot

Let's backport this.

innobead avatar Aug 26 '24 14:08 innobead

We need to figure out how to check the built-in kernel module. Let's move it to v1.7.2 instead.

derekbit avatar Aug 27 '24 02:08 derekbit

Verified on v1.8.0-dev-20240922 20231001

  • longhorn v1.8.0-dev-20240922 https://github.com/longhorn/longhorn/commit/ddfda37472d9b4d877c2a60663bb176ae3828825
  • cli v1.8.0-dev-20240922 https://github.com/longhorn/cli/commits/v1.8.0-dev-20240922/

The test steps https://github.com/longhorn/longhorn/issues/9153#issuecomment-2285974275

Result

  1. Output when dm_crypt is not loaded:
  warn:
  - multipathd.service is running. Please refer to https://longhorn.io/kb/troubleshooting-volume-with-multipath/ for more information.
ryao-demo-0926-w2-94f08d41-4tssf:
  error:
  - 'Module dm_crypt is not loaded: failed to execute: nsenter [--mount=/host/proc/1940366/ns/mnt --net=/host/proc/1940366/ns/net grep dm_crypt /proc/modules], output , stderr : exit status 1'
  info:
  - Service iscsid is running
  - NFS4 is supported
  - Package nfs-common is installed
  - Package open-iscsi is installed
  - Package cryptsetup is installed

Additionally, in the node's YAML:

    - lastProbeTime: ""
      lastTransitionTime: "2024-09-30T02:55:13Z"
      message: Kernel modules [dm_crypt] are not loaded on node ryao-demo-0926-w3-57d499c4-2chfp
      reason: KernelModulesNotLoaded
      status: "False"
      type: KernelModulesLoaded

After running longhornctl install preflight and executing longhornctl check preflight again, you can see that dm_crypt has been successfully loaded.

  - Module dm_crypt is loaded

The node's YAML is updated as well:

    - lastProbeTime: ""
      lastTransitionTime: "2024-10-01T02:04:23Z"
      message: Kernel modules [dm_crypt] are loaded on node ryao-demo-0926-w3-57d499c4-2chfp
      reason: ""
      status: "True"
      type: KernelModulesLoaded

roger-ryao avatar Oct 01 '24 02:10 roger-ryao

Reopen this issue because we also need to update the document on Root and Privileged Permission - Longhorn Manager about https://github.com/longhorn/longhorn/pull/9374.

cc @derekbit

c3y1huang avatar Dec 06 '24 03:12 c3y1huang

Reopen this issue because we also need to update the document on Root and Privileged Permission - Longhorn Manager about #9374.

cc @derekbit

Close it due to https://github.com/longhorn/website/pull/1017 having been merged

roger-ryao avatar Dec 10 '24 08:12 roger-ryao

Image

AkakievKD avatar Feb 12 '25 13:02 AkakievKD

on 1 of 3 workers and instance-manager recreating all time

AkakievKD avatar Feb 12 '25 13:02 AkakievKD

node info

Image

AkakievKD avatar Feb 12 '25 13:02 AkakievKD

@AkakievKD I don't think this is related to the dm_crypt node condition. It just records the status and won't stop any instance-manager pod. Please provide a support bundle for troubleshooting.

derekbit avatar Feb 12 '25 13:02 derekbit