longhorn-manager icon indicating copy to clipboard operation
longhorn-manager copied to clipboard

fix: add nvme manual disk driver when auto detection failed to detect the nvme driver

Open Hugome opened this issue 5 months ago • 4 comments

Which issue(s) this PR fixes:

Issue longhorn/longhorn#11127

What this PR does / why we need it:

I had issues using NVMe drives on a talos cluster with a recent kernel. After a bit of digging I found out that the driver name for "vfio_pci" is not the same. Running the "auto-detection" driver commands manually :

nsenter --mount=/proc/1/ns/mnt --ipc=/proc/1/ns/ipc --net=/proc/1/ns/net env PCI_ALLOWED=0000:02:00.0 bash /usr/src/spdk/scripts/setup.sh
0000:01:00.0 (144d a80a): Skipping denied controller at 0000:01:00.0
0000:01:00.0 (144d a80a): Active devices: mount@nvme0n1:nvme0n1p5, so not binding PCI dev
0000:02:00.0 (144d a80a): nvme -> vfio-pci
INFO: Requested 1024 hugepages but 1024 already allocated 

nsenter --mount=/proc/1/ns/mnt --ipc=/proc/1/ns/ipc --net=/proc/1/ns/net env PCI_ALLOWED=0000:02:00.0 bash /usr/src/spdk/scripts/setup.sh disk-status 0000:02:00.0
0000:01:00.0 (144d a80a): Skipping denied controller at 0000:01:00.0
0000:01:00.0 (144d a80a): Active devices: mount@nvme0n1:nvme0n1p5, so not binding PCI dev
{"bdf":"0000:02:00.0","type":"NVMe","driver":"vfio-pci","vendor":"144d","numa":"unknown","device":"-","block_devices":"-"}

uname -a
Linux instance-manager-b2105364cf7e819d6e95b009ebbcd205 6.12.31-talos #1 SMP Tue Jun  3 10:47:32 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
instance-manager-b2105364cf7e819d6e95b009ebbcd205:/ # 

The issue come from the module name returned by spdk from the Talos kernel it is vfio-pci and not the excepted vfio_pci here : https://github.com/longhorn/longhorn-spdk-engine/blob/main/pkg/spdk/disk/types.go#L56 https://github.com/longhorn/go-common-libs/blob/main/types/file.go#L20

I solved this by editing the CRD manually and allowing to manual set the "nvme" disk driver.

Special notes for your reviewer:

May not be the best solution and dosen't fix the autodetection code (Maybe checking both - and _ version in the if ?) but it offer the ability to control it manually.

Additional documentation or context

Hugome avatar Jun 14 '25 12:06 Hugome

[!IMPORTANT]

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Jun 14 '25 12:06 coderabbitai[bot]

Thanks @Hugome for the contribution. Could you create a BUG ticket in https://github.com/longhorn/longhorn/issues and add the ticket number to https://github.com/longhorn/longhorn-manager/pull/3848#issue-3146102233? Thank you.

derekbit avatar Jun 17 '25 00:06 derekbit

cc @c3y1huang @shuo-wu

derekbit avatar Jun 17 '25 00:06 derekbit

Yes no problem, it is done :+1: Thanks

Hugome avatar Jun 17 '25 00:06 Hugome

Hello @Hugome Can you execute bash k8s/generate_code.sh and add the changed files to the PR? Thanks.

derekbit avatar Jul 09 '25 03:07 derekbit

Hello @derekbit

Done and it helped fix a typo issue :+1:

Thanks

Hugome avatar Jul 13 '25 11:07 Hugome

Thanks @Hugome!

derekbit avatar Jul 13 '25 15:07 derekbit