dracut icon indicating copy to clipboard operation
dracut copied to clipboard

fix(nvmf): move /etc/nvme/host{nqn,id} requirement to hostonly

Open tbzatek opened this issue 2 years ago • 13 comments

This pull request changes the 95nvmf module /etc/nvme/host{nqn,id} files inclusion.

Changes

When creating initramfs for universal boot image such as an installer, we can't include any machine-specific IDs. Let's move the check for /etc/nvme/hostnqn and /etc/nvme/hostid files presence to the hostonly section to avoid unsatisfied requirements.

Checklist

  • [x] I have tested it locally
  • [x] I have reviewed and updated any documentation if relevant
  • [ ] ~~I am providing new code and test(s) for it~~

tbzatek avatar Sep 21 '23 12:09 tbzatek

Cc: @mwilck @johnmeneghini @lnykryn

This is my attempt to fix the universal-image scenario. We're facing this issue specifically in the installer, may apply to any kind of pre-build images (e.g. cloud) or layered images with immutable base.

For obvious reasons we can't include any machine-specific IDs and that's where we're facing unsatisfied requirements, leading to the nvmf module not being included.

For the NBFT case where HostNQN and HostID is configured, no files are needed. For the rest of the cases we'll need to rely on nvme-cli/libnvme auto-generated IDs from e.g. DMI records. An alternative would be to generate new IDs before making first connection attempt, subject to discussion.

tbzatek avatar Sep 21 '23 12:09 tbzatek

LGTM.

pvalena avatar Sep 21 '23 15:09 pvalena

Then both files should be marked as hostonly, i.e.,inst_simple -H <file> on the following lines:

https://github.com/dracutdevs/dracut/blob/6acfecae572fb457115b276b5b64d9424ad5187b/modules.d/95nvmf/module-setup.sh#L133-L134

Done.

I haven't tested this change yet, we'll be doing extensive testing in coming weeks with different combinations of Fabrics tech and HostNQN/HostID (non)presence. For now I just wanted to have more eyes to evaluate the changes.

tbzatek avatar Sep 22 '23 12:09 tbzatek

This causes both files to be written to the internal /usr/lib/dracut/hostonly-files when the initrd is built in hostonly mode, so these files (and all files that are only needed in a hostonly initrd) can be removed at boot by passing the rd.hostonly=0 option in the kernel command line (see man dracut.cmdline.7).

That doesn't make much sense for NVMe hostnqn and hostid. Without a hostnqn/hostid, no NVMe-oF connections are possible. If the user wants to set different values, she can do so today by using rd.nvmf.hostnqn= and rd.nvmf.hostid= parameters.

mwilck avatar Sep 22 '23 13:09 mwilck

On 9/22/23 09:33, Martin Wilck wrote:

That doesn't make much sense for NVMe hostnqn and hostid. Without a hostnqn/hostid, no NVMe connections are possible. If the user wants to set different values, she can do so today by using |rd.nvmf.hostnqn=| and |rd.nvmf.hostid=| parameters.

The upstream nvme-cli code today will create the hostnqn/hostid if none is provided. I think this is ideally what we want.

https://github.com/linux-nvme/nvme-cli/blob/a2587cbd2eea1d0506d052356abe94cb27cb9884/fabrics.c#L733

So as long as the runtime environment provides a dmi/product_uuid, you don't need a /etc/nvme/hostnqn\id file.

sudo cat /sys/class/dmi/id/product_uuid 66b848cc-25ac-11b2-a85c-ff377e9ee3de

Ideally we want all supported systems to be complaint with TP-4126.

From TP-4126 NVMe-oF Boot HostNQN and HostID: this is requirement for the host.

The pre-OS boot environment and OS environment should use a fixed platform UUID to create a HostNQN and HostID. The implementation should use the System UUID found in the SMBIOS table. The System Management BIOS (SMBIOS) Reference Specification is described in DSP0134. The SMBIOS table is typically available to pre-OS firmware and Expansion ROM Firmware in the pre-OS boot environment as well as to the OS environment.

I think rd.nvmf.hostnqn/hostid should be a manual override.

johnmeneghini avatar Sep 22 '23 14:09 johnmeneghini

Right. I didn't think about the product UUID automatism. With that, it would actually be possible to clone a system and use rd.hostonly=0 for booting (assuming that the target subsystem is configured to serve namespaces to the new host's NQN, of course).

mwilck avatar Sep 22 '23 14:09 mwilck

Great, I was hoping to have this kind of discussion!

More things to consider:

  • not all distributions are making hostonly initramfs during installation
  • we might still want to include /etc/nvme/host* files in the initramfs when available but at the same time play nice when they're missing (the universal image scenario)
  • kernel 6.5 and related nvme-cli TP-4126 fixes changed how things work (considering corner cases like no DMI UUID records)
  • I still want to have these three scenarios tested: NVMe/FC, NBFT with HostNQN set in the boot attempt and NBFT without HostNQN set. This will take some time.

tbzatek avatar Sep 22 '23 15:09 tbzatek

NBFT without HostNQN set

I'm pretty certain that won't work, at least not with the current firmware.

mwilck avatar Sep 22 '23 15:09 mwilck

On 9/22/23 11:06, Tomáš Bžatek wrote:

I still want to have this tested three scenarios: NVMe/FC, NBFT with HostNQN set in the boot attempt and NBFT without HostNQN set. This will take some time.

I agree that we want to test at least these three scenarios. Not sure if NBFT boot w/out HostNQN is will work with the existing code... but in theory it should be possible with v6.5...

Let's make sure we don't merge this until all of this has been tested.

johnmeneghini avatar Sep 22 '23 17:09 johnmeneghini

This has been thoroughly tested on various system deployments for the past two months. Ready for review.

tbzatek avatar Dec 05 '23 15:12 tbzatek

LGTM.

pvalena avatar Jan 02 '24 22:01 pvalena

Rebased. Anything else missing for review?

tbzatek avatar Feb 27 '24 14:02 tbzatek

This issue is being marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. If this is still an issue in the latest release of Dracut and you would like to keep it open please comment on this issue within the next 7 days. Thank you for your contributions.

stale[bot] avatar Apr 22 '24 06:04 stale[bot]