dracut icon indicating copy to clipboard operation
dracut copied to clipboard

fix(kernel-modules): detect hostonly block driver through /sys/mod

Open pfliu opened this issue 3 years ago • 8 comments

*** Issue ***

The topology of the kernel driver under sysfs may be complicated, as demonstrated by the following case occurs in Power10. This raises challenge to detect the exact hostonly driver's module name. Finally kdump fails to install nvme.ko for the dump target, which causes the failure of vmcore dumping.

In that Power10 case, the /sys/dev/block/major:minor links to /sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1, but travelling up along the path, there is no file "driver/module", so the "nvme" module can not be detected.

In fact, there is a file "driver/module" linking to "/sys/module/nvme" under "/sys/devices/pci0181:60/0181:60:00.0/", directed by "/sys/devices/virtual/nvme-subsystem/nvme-subsys0/nvme0". But there is no hint to bring "nvme0n1" to "nvme0". So it is a big challenge to figure out the topology.

*** The fix ***

To fix the detection of hostonly module, instead of trying to travelling and jumping among /sys/dev/*, the /sys/module/mod/refcnt can be utilized to know if the modules is hostonly or not.

*** The topology of nvme on a Power10 for reference ***

[root@raplp91 ~]# ls -lrt /dev/block/* lrwxrwxrwx 1 root root 10 Dec 21 09:37 /dev/block/259:1 -> ../nvme0n1 lrwxrwxrwx 1 root root 12 Dec 21 09:37 /dev/block/259:2 -> ../nvme0n1p1 lrwxrwxrwx 1 root root 7 Dec 21 09:37 /dev/block/253:1 -> ../dm-1 lrwxrwxrwx 1 root root 12 Dec 21 09:37 /dev/block/259:3 -> ../nvme0n1p2 lrwxrwxrwx 1 root root 7 Dec 21 09:37 /dev/block/253:0 -> ../dm-0 lrwxrwxrwx 1 root root 12 Dec 21 09:37 /dev/block/259:4 -> ../nvme0n1p3 lrwxrwxrwx 1 root root 7 Dec 21 09:37 /dev/block/253:2 -> ../dm-2

[root@raplp91 ~]# ls -lrt /sys/dev/block/* lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/259:1 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/259:4 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p3 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/259:3 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p2 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/259:2 -> ../../devices/virtual/nvme-subsystem/nvme-subsys0/nvme0n1/nvme0n1p1 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/253:0 -> ../../devices/virtual/block/dm-0 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/253:1 -> ../../devices/virtual/block/dm-1 lrwxrwxrwx 1 root root 0 Dec 21 09:37 /sys/dev/block/253:2 -> ../../devices/virtual/block/dm-2

/sys/devices/virtual/nvme-subsystem tree structure ├── nvme-subsys0 │   ├── firmware_rev │   ├── iopolicy │   ├── model │   ├── ng0n1 │   │   ├── dev │   │   ├── device -> ../../nvme-subsys0 │   │   ├── power │   │   │   ├── autosuspend_delay_ms │   │   │   ├── control │   │   │   ├── runtime_active_time │   │   │   ├── runtime_status │   │   │   └── runtime_suspended_time │   │   ├── subsystem -> ../../../../../class/nvme-generic │   │   └── uevent │   ├── ng0n2 ...

│   ├── nvme0 -> ../../../pci0181:60/0181:60:00.0/nvme/nvme0 │   ├── nvme0n1 │   │   ├── alignment_offset │   │   ├── bdi -> ../../../bdi/259:6 │   │   ├── capability │   │   ├── dev │   │   ├── device -> ../../nvme-subsys0 │   │   ├── discard_alignment │   │   ├── events │   │   ├── events_async │   │   ├── events_poll_msecs │   │   ├── ext_range │   │   ├── hidden │   │   ├── holders │   │   ├── inflight │   │   ├── integrity │   │   │   ├── device_is_integrity_capable │   │   │   ├── format │   │   │   ├── protection_interval_bytes │   │   │   ├── read_verify │   │   │   ├── tag_size │   │   │   └── write_generate │   │   ├── nguid │   │   ├── nsid │   │   ├── nvme0n1p1 │   │   │   ├── alignment_offset │   │   │   ├── dev │   │   │   ├── discard_alignment │   │   │   ├── holders │   │   │   ├── inflight │   │   │   ├── partition │   │   │   ├── power │   │   │   │   ├── autosuspend_delay_ms │   │   │   │   ├── control │   │   │   │   ├── runtime_active_time │   │   │   │   ├── runtime_status │   │   │   │   └── runtime_suspended_time │   │   │   ├── ro │   │   │   ├── size │   │   │   ├── start │   │   │   ├── stat │   │   │   ├── subsystem -> ../../../../../../class/block │   │   │   ├── trace │   │   │   │   ├── act_mask │   │   │   │   ├── enable │   │   │   │   ├── end_lba │   │   │   │   ├── pid │   │   │   │   └── start_lba │   │   │   └── uevent │   │   ├── nvme0n1p2 ...

Signed-off-by: Pingfan Liu [email protected]

This pull request changes...

Changes

Checklist

  • [x] I have tested it locally
  • [x] I have reviewed and updated any documentation if relevant
  • [ ] I am providing new code and test(s) for it

Fixes #

pfliu avatar Apr 22 '22 04:04 pfliu

@haraldh @johannbg @danimo

Could you give a review? Thanks.

pfliu avatar Apr 28 '22 01:04 pfliu

But there is no hint to bring "nvme0n1" to "nvme0".

Worth fixing this in the kernel and make a quirk for this in dracut, instead of implementing the second best option with the refcnt.

haraldh avatar Apr 28 '22 06:04 haraldh

his in the kernel and make a quirk for this in dracut

pfliu avatar Apr 29 '22 03:04 pfliu

But there is no hint to bring "nvme0n1" to "nvme0".

Worth fixing this in the kernel and make a quirk for this in dracut, instead of implementing the second best option with the refcnt.

The special topology on Power10 system is due to "Native NVMe multipathing" instead of "DM Multipath" https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/enabling-multipathing-on-nvme-devices_managing-storage-devices

So from the view point of kernel, the topology is valid

Thanks,

Pingfan

pfliu avatar Apr 29 '22 03:04 pfliu

his in the kernel and make a quirk for this in dracut

Sorry that I had made a mistaken to partial comment and close the request. I have re-open the request

pfliu avatar Apr 29 '22 03:04 pfliu

But there is no hint to bring "nvme0n1" to "nvme0".

Worth fixing this in the kernel and make a quirk for this in dracut, instead of implementing the second best option with the refcnt.

The special topology on Power10 system is due to "Native NVMe multipathing" instead of "DM Multipath" https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_storage_devices/enabling-multipathing-on-nvme-devices_managing-storage-devices

So from the view point of kernel, the topology is valid

Thanks,

Pingfan

And the kernel code is drivers/nvme/host/multipath.c:537: rc = device_add_disk(&head->subsys->dev, head->disk,

As we can see, the disk parent is "head->subsys->dev" , which is the case on the Power10.

So there is no persistent topology for nvme device driver, and it is better to find another way out.

pfliu avatar Apr 29 '22 03:04 pfliu

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar May 30 '22 21:05 stale[bot]

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar Jul 01 '22 12:07 stale[bot]

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar Aug 31 '22 11:08 stale[bot]