netperf: implements dynamic NUMA binding
netperf: implements dynamic NUMA binding
The test was taking the last NUMA node to bind the VM's memory. In some systems the last NUMA node could have no memory and/or CPUs assigned, updates the test to take the first valid node.
Signed-off-by: mcasquer [email protected] ID: 3321
Test cases didn't pass but the error is not related with this patch
(1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
(1/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: ERROR: local variable 'client_pub_ip' referenced before assignment (384.46 s)
(2/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
(2/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: ERROR: local variable 'client_pub_ip' referenced before assignment (376.33 s)
(3/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED
(3/3) Host_RHEL.m9.u5.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: ERROR: Timeout expired while waiting for shell command to complete: 'C:\\rhcygwin\\Cygwin.bat -i /Cygwin-Terminal.ico -' (output: 'The system cannot find the path specified.\n\nC:\\>') (272.61 s)
RESULTS : PASS 0 | ERROR 3 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
@heywji could you review this PR? Thanks !
LGTM. Thanks to Mario's efforts and help.
Hello, other reviewers.
Let me explain some background here. The netperf of my netkvm test loop is netperf_stress_test.cfg. But someday I type the test case name as 'netperf', some errors reported. After talking with @mcasquer, we confirmed it was because of the NUMA node memory issue.
It's an actual NUMA issue improvement, even though it is not directly connected with my netkvm test loop.
@zhencliu @PaulYuuu please, could you review this PR and Wenkang's comment? Thanks!
@heywji please, whenever is possible, could you test the latest patch changes? I saw in your host some QEMU and avocado processes running already... thanks !
LGTM
(1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED
(1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (1596.48 s)
(2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
(2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.43 s)
(3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
.default_install.aio_threads.q35: STARTED
(1/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (1596.48 s)
(2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
(2/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.43 s)
(3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
(3/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.13 s)
(4/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED
(4/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.19 s)
(5/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.q35: STARTED (5/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.22 s)
(6/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.best_registry_setting.q35: STARTED
(6/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.best_registry_setting.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.12 s)
(7/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.cygwin.q35: STARTED
(7/7) Host_RHEL.m9.u5.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.i386.io-github-autotest-qemu.netperf.default.host_guest.cygwin.q35: CANCEL: The node: 7 used for VM pinning is not valid (17.06 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 6
@zhencliu @PaulYuuu please, could you review again this PR? Thanks !
@zhencliu @PaulYuuu please, could you review again this PR? Thanks !
hi Mario, it looks there are still 2 pending comments inline from my side, esp. for the preprocess, your test passed because you don't need a second disk, and image1 has already been created, IMO. But it may be more safe to call the preprocess of both vms and images when not_preprocess = yes, what do you think?
@zhencliu @PaulYuuu please, could you review again this PR? Thanks !
hi Mario, it looks there are still 2 pending comments inline from my side, esp. for the preprocess, your test passed because you don't need a second disk, and image1 has already been created, IMO. But it may be more safe to call the preprocess of both vms and images when not_preprocess = yes, what do you think?
@zhencliu code updated, @heywji please could you give another try?
@mcasquer Yes, I am testing it. I will update the patch's result when it is done.
LGTM
@zhencliu any more comments on this PR?
Tests results with Win10 VM
(1/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED
(1/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (2526.82 s)
(2/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED
(2/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: ERROR: cannot access local variable 'client_pub_ip' where it is not associated with a value (365.73 s)
(3/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED
(3/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: ERROR: cannot access local variable 'client_pub_ip' where it is not associated with a value (390.45 s)
(4/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED
(4/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: ERROR: Timeout expired while waiting for shell command to complete: 'C:\\rhcygwin\\Cygwin.bat -i /Cygwin-Terminal.ico -' (output: 'The system cannot find the path specified.\n\nC:\\>') (263.49 s)
RESULTS : PASS 1 | ERROR 3 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
Asi discussed with @zhencliu and @heywji these failures are not related with this patch as running the tests without it will lead to the same results, but indeed it can be appreciated that forcing to boot up the VM with the last NUMA node has been fixed.
[stdlog] 2025-03-12 05:12:50,677 avocado.virttest.qemu_vm qemu_vm L3839 INFO | Running qemu command (reformatted):
[stdlog] MALLOC_PERTURB_=1 numactl \
[stdlog] -m 0 /usr/libexec/qemu-kvm \
Tests results with Win10 VM
(1/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: STARTED (1/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.unattended_install.cdrom.extra_cdrom_ks.default_install.aio_threads.q35: PASS (2526.82 s) (2/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: STARTED (2/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.q35: ERROR: cannot access local variable 'client_pub_ip' where it is not associated with a value (365.73 s) (3/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: STARTED (3/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.best_registry_setting.q35: ERROR: cannot access local variable 'client_pub_ip' where it is not associated with a value (390.45 s) (4/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: STARTED (4/4) Host_RHEL.m10.u0.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.Win10.x86_64.io-github-autotest-qemu.netperf.with_jumbo.host_guest.cygwin.q35: ERROR: Timeout expired while waiting for shell command to complete: 'C:\\rhcygwin\\Cygwin.bat -i /Cygwin-Terminal.ico -' (output: 'The system cannot find the path specified.\n\nC:\\>') (263.49 s) RESULTS : PASS 1 | ERROR 3 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0Asi discussed with @zhencliu and @heywji these failures are not related with this patch as running the tests without it will lead to the same results, but indeed it can be appreciated that forcing to boot up the VM with the last NUMA node has been fixed.
[stdlog] 2025-03-12 05:12:50,677 avocado.virttest.qemu_vm qemu_vm L3839 INFO | Running qemu command (reformatted): [stdlog] MALLOC_PERTURB_=1 numactl \ [stdlog] -m 0 /usr/libexec/qemu-kvm \
Thanks for the information. client_pub_ip is defined inside a elif code block, if the test cannot run into the elif, client_pub_ip is not defined, you can push another patch to fix it :-)