ipxe-examples feat: support additional virtualized NVMe disks

Any node for which nvme_size is specified in settings.yml will automatically have one additional virtual NVMe disk created, which will appear as /dev/nvme0n1. This can be useful for testing additional disks with Longhorn. The way it works is a bit obscure because libvirt only seems to support virtio, scsi and sata disks, so we have to resort to hacking qemu arguments in as described in this blog post:

http://blog.frankenmichl.de/2018/02/13/add-nvme-device-to-vm/

This means we have to go create the volume manually first with virsh then pass its path to qemu. Matters are complicated further by volumes being owned by root by default, while libvirt runs VMs as the qemu user. For normal libvirt volumes, file ownership changes happen automatically, but for these volumes we're hacking in by hand, that doesn't happen, so we have to explicitly specify ownership, and to do that we have to define the volume via XML, hence the nasty virsh vol-create-as ... | sed ... | virsh vol-create invocation.

The other wrinkle is that we need to know the exact path to the disk image. Once a volume has been created, you can run virsh vol-path --pool default VOLUME_NAME to get the path, but we need to know the path when setting libvirt.qemuargs, whereas we don't want to actually create the volume until the vagrant up trigger for the node in question. If we create the volume outside the trigger, it gets created on every evaluation of the Vagrant file, even for unrelated VMs... So, we call virsh pool-dumpxml default and get the pool path from that, then stick the volume name on the end.

To test, try something like this in settings.yml:

harvester_network_config:
  cluster:
    - ip: 192.168.0.30
      # [...]
      nvme_size: 10G

Nov 04 '24 09:11 tserong

Does the network interface name change after adding an NVME disk? I have an impression that the ens5/ens6 or something changed after adding the second disk.

Nov 11 '24 01:11 bk201

Doesn't seem to make any difference. I just gave it a quick test checking ip a in the installer, first without an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:71:b5:1c brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.53/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

...and second with an NVMe:

rancher@rancher:~> ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:5d:85:07 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 192.168.121.66/24 brd 192.168.121.255 scope global ens5
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:00:00:0d:62:e2 brd ff:ff:ff:ff:ff:ff
    altname enp0s6

Nov 11 '24 04:11 tserong

@m-ildefons you might be interested in this, as I see you're adding multiple disk support in https://github.com/harvester/ipxe-examples/pull/86 :-)

Nov 11 '24 04:11 tserong

When running the setup script i got this error:

$ ./setup_harvester.sh
~/git/ipxe-examples/vagrant-pxe-harvester ~/git/ipxe-examples/vagrant-pxe-harvester
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match
'all'

PLAY [Setup Harvester] *************************************************************************************************

TASK [debug] ***********************************************************************************************************
ok: [localhost] => 
  msg: Installation Started

TASK [install PXE server] **********************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up pxe_server
  delta: '0:00:00.550344'
  end: '2024-11-11 08:27:47.004519'
  msg: non-zero return code
  rc: 1
  start: '2024-11-11 08:27:46.454175'
  stderr: |-
    error: failed to get pool 'default'
    error: Storage pool not found: no storage pool with matching name 'default'
    Vagrant failed to initialize at a very early stage:
  
    There was an error loading a Vagrantfile. The file being loaded
    and the error message are shown below. This is usually caused by
    a syntax error.
  
    Path: <provider config: libvirt>
    Line number: 114
    Message: NoMethodError: undefined method `content' for nil:NilClass
  
              pool_path = Nokogiri::XML(%x(virsh pool-dumpxml default)).at_xpath('/pool/target/path').content
                                                                                                     ^^^^^^^^
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP *************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

I have no default pool on my system:

$ virsh pool-list
 Name   State   Autostart
---------------------------

Nov 11 '24 08:11 votdev

I would suggest that you add the line # nvme_size: 10G to each predefined cluster node in settings.yaml, so that everyone can see at first glance that there is an optional setting here.

Nov 11 '24 08:11 votdev

When running the setup script i got this error: ...

Same error as @votdev @tserong do you have any addtional settings for vagrant to work with virsh?

Nov 11 '24 08:11 albinsun

By the way I think we can have a more general naming cuz the created qcow2 file can also be used as other disk types, not only NVMe

Nov 11 '24 08:11 albinsun

OK, that's weird. I don't think I did anything special to be able to see the default pool... I am in the libvirt group if that makes a difference...?

tserong@ra:~> virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   no
 images    active   yes

tserong@ra:~> id 
uid=1000(tserong) gid=100(users) groups=100(users),108(libvirt),466(docker)

Nov 12 '24 01:11 tserong

Oh, wait a minute, my ~/.bashrc includes this:

export LIBVIRT_DEFAULT_URI=qemu:///system

I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

Nov 12 '24 02:11 tserong

Actually, if you don't mind :-) please try the updated version of this PR which explicitly specifies qemu:///system as the virsh connection URI. This should do the trick, I hope. I tested by manually unsetting my LIBVIRT_DEFAULT_URI environment variable, the when I ran the setup scripts I saw the same failure you both saw, then I applied this latest change and it ran through OK.

Nov 12 '24 03:11 tserong

Oh, wait a minute, my ~/.bashrc includes this:
export LIBVIRT_DEFAULT_URI=qemu:///system
I'm going to have to experiment a bit more here. Out of curiosity @votdev @albinsun, what do you see if you run virsh uri (I'm guessing it's qemu:///session). What about if you run LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list? Does that get a pool list? And, does that command require any additional authentication?

$ virsh uri
qemu:///session

$ LIBVIRT_DEFAULT_URI=qemu:///system virsh pool-list
 Name      State    Autostart
-------------------------------
 default   active   yes

Nov 12 '24 07:11 votdev

I'm sorry, it fails again.

TASK [boot Harvester Node 0] **************************************************************************************************
fatal: [localhost]: FAILED! => changed=true 
  cmd: |-
    vagrant up harvester-node-0
  delta: '0:00:00.606143'
  end: '2024-11-12 09:07:36.016111'
  msg: non-zero return code
  rc: 1
  start: '2024-11-12 09:07:35.409968'
  stderr: |-
    ==> harvester-node-0: Trigger run failed
    ==> harvester-node-0: A script exited with an unacceptable exit code 2.
    A script exited with an unacceptable exit code 2.
  stderr_lines: <omitted>
  stdout: |-
    Bringing machine 'harvester-node-0' up with 'libvirt' provider...
    ==> harvester-node-0: Running action triggers before up ...
    ==> harvester-node-0: Running trigger...
    ==> harvester-node-0: Creating volume vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2
        harvester-node-0: Running local: Inline script
        harvester-node-0: sh -c "              export LIBVIRT_DEFAULT_URI=qemu:///system               virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 ||               virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2                     --capacity 10G                     --format qcow2 --print-xml |               sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' |               virsh vol-create default /dev/stdin 2>&1"
  stdout_lines: <omitted>

PLAY RECAP ********************************************************************************************************************
localhost                  : ok=10   changed=3    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Nov 12 '24 08:11 votdev

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

Nov 12 '24 09:11 tserong

I'm sorry, it fails again.

All good, at least we got slightly further :-) but it's annoying that there's not more detail in the output as to what's failing now :-/ I'm not sure offhand how to further instrument that trigger script... I don't suppose you could try running this nasty thing directly from a shell and see if it prints anything interesting...?

export LIBVIRT_DEFAULT_URI=qemu:///system ; virsh vol-info --pool=default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 >/dev/null 2>&1 || virsh vol-create-as default vagrant-pxe-harvester_harvester-node-0-nvme0.qcow2 --capacity 10G --format qcow2 --print-xml |sed '/<target>/a <permissions><owner>0</owner><group>0</group></permissions>' | virsh vol-create default /dev/stdin 2>&1

This command succeeds when running in CLI. Will try how it behaves when there are no nodes deployed.

Nov 12 '24 13:11 votdev

Latest code successfully create nvme0.qcow2 but seems not in hardware list yet. Does it need to be manually Add Hardware -> Storage?

Tried to manually add disk but fail with

Dec 02 '24 07:12 albinsun