eve icon indicating copy to clipboard operation
eve copied to clipboard

Testing and Integrating new qemu_thread_set_affinity definition

Open roja-zededa opened this issue 1 year ago • 16 comments

Xentools 4.19.0 which uses QEMU 8.0.4 has new implementation for CPU pinning. A big difference between our implementation and the QEMU vanilla version is that our implementation can pin vCPU-to-pCPU 1-to-1, while the QEMU implementation allows only a set of CPUs to be assigned to an entire VM (and the VM's vCPU threads can still migrate among the given set of pCPUs). Need to verify and intergrate if the new qemu_thread_set_affinity definition is suitable for our use case

roja-zededa avatar Sep 04 '24 17:09 roja-zededa

The most critical use case for the new CPU pinning is realtime, as this implementation does not really pin CPUs. I would switch to it if it does not worsen the results of RT tests. @rouming can provide more information, how he performed the testing.

Also, if using the native qemu implementation, the qemu parameters should be used in native way, as well. You can find it in the PR I shared with you earlier: https://github.com/rucoder/eve/pull/1

OhmSpectator avatar Sep 04 '24 20:09 OhmSpectator

The most critical use case for the new CPU pinning is realtime, as this implementation does not really pin CPUs. I would switch to it if it does not worsen the results of RT tests. @rouming can provide more information, how he performed the testing.

Also, if using the native qemu implementation, the qemu parameters should be used in native way, as well. You can find it in the PR I shared with you earlier: rucoder#1

@rouming Could you please provide more info on RT CPU Pinning tests?

roja-zededa avatar Sep 16 '24 20:09 roja-zededa

The most critical use case for the new CPU pinning is realtime, as this implementation does not really pin CPUs. I would switch to it if it does not worsen the results of RT tests. @rouming can provide more information, how he performed the testing. Also, if using the native qemu implementation, the qemu parameters should be used in native way, as well. You can find it in the PR I shared with you earlier: rucoder#1

@rouming Could you please provide more info on RT CPU Pinning tests?

The RT suite is here: https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git/ , the standard de facto is cyclictest, you can search how to use it if you wish, but I can tell in advance, that pinning the whole process with all its thread to set of cpus comparing to 1to1 mapping has no visible impact on any test results, just because scheduler is smart enough and wont perform a task migration to another cpu if load on all vcpus is equal (usually this is the case). even if task migrates, that does not happen frequently, so impact is misarable.

rouming avatar Sep 16 '24 20:09 rouming

It would be nice to remove the custom patches in this case.

But I don't see how the QEMU vanilla pinning is used here? To run with the native CPU pinning, QEMU should start with -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7 option or similar...

OhmSpectator avatar Sep 16 '24 21:09 OhmSpectator

@roja-zededa I see this PR contains the commit message from another PR "Upgraded Xen-tools from 4.15 to 4.19.0", not sure what is the purpose of duplicating these if this is not a mistake.

rouming avatar Sep 17 '24 11:09 rouming

@roja-zededa I see this PR contains the commit message from another PR "Upgraded Xen-tools from 4.15 to 4.19.0", not sure what is the purpose of duplicating these if this is not a mistake.

@rouming No, it wasn't a duplicate. Upgrading xentool version to 4.19.0 introduced a new vanilla qemu pinning feature from QEMU 8.0, we're testing here if we need to keep the vanilla pinning feature or go with Nikolay's 1-1 custom pinning patch which is already included in the previous xentools versions.

roja-zededa avatar Sep 17 '24 17:09 roja-zededa

@roja-zededa I see this PR contains the commit message from another PR "Upgraded Xen-tools from 4.15 to 4.19.0", not sure what is the purpose of duplicating these if this is not a mistake.

@rouming No, it wasn't a duplicate. Upgrading xentool version to 4.19.0 introduced a new vanilla qemu pinning feature from QEMU 8.0, we're testing here if we need to keep the vanilla pinning feature or go with Nikolay's 1-1 custom pinning patch which is already included in the previous xentools versions.

Now I see you pushed a commit which corresponds to the PR description, nice.

Can you please share what are the results of testing and how exactly do you test two different implementations.

rouming avatar Sep 18 '24 06:09 rouming

I would just cherry-pick the original commit and save the commit message.

git remote add ohmspectator [email protected]:OhmSpectator/eve.git
git fetch ohmspectator
git cherry-pick 32ec62b016016792d2c07512428f2e50559e7c34

OhmSpectator avatar Oct 01 '24 20:10 OhmSpectator

Trying to launch a VM but got this error:

{"source":"369dc470-4c17-4278-b863-0af36e746804.1.1","content":"qemu-system-x86_64: -no-hpet: warning: -no-hpet is deprecated, use '-machine hpet=off' instead","msgid":1,"timestamp":{"seconds":1727980005,"nanos":666473281}}

{"source":"369dc470-4c17-4278-b863-0af36e746804.1.1","content":"qemu-system-x86_64: Property 'pc-q35-3.1-machine.cpumask' not found","msgid":2,"timestamp":{"seconds":1727980005,"nanos":666486415}}

roja-zededa avatar Oct 03 '24 18:10 roja-zededa

Trying to launch a VM but got this error:

{"source":"369dc470-4c17-4278-b863-0af36e746804.1.1","content":"qemu-system-x86_64: -no-hpet: warning: -no-hpet is deprecated, use '-machine hpet=off' instead","msgid":1,"timestamp":{"seconds":1727980005,"nanos":666473281}}

{"source":"369dc470-4c17-4278-b863-0af36e746804.1.1","content":"qemu-system-x86_64: Property 'pc-q35-3.1-machine.cpumask' not found","msgid":2,"timestamp":{"seconds":1727980005,"nanos":666486415}}

Maybe something else has changed between QEMU 9.0.4, where I tested it, and the version introduced with the Xen-tool upgrade. I hope you can figure out the reason =)

Also, I want to mention a new test suite introduced by Vignesh. It automated, among others, the CPU pinning affinity checks. We should have a demo soon on how to use/adopt it. Now it's available from the Zededa Jenkins UI with the title: "Ztaf"

OhmSpectator avatar Oct 10 '24 09:10 OhmSpectator

@roja-zededa, any progress on manual testing?

OhmSpectator avatar Oct 16 '24 10:10 OhmSpectator

qemu_thread_set_affinity(cpu->thread, NULL, cpu->cpumask); definition has been changed in the newer version. As of now, passing NULL for CPUlist but that might cause some issues. Need to change patch 15 to accommodate the new definition.

roja-zededa avatar Oct 16 '24 17:10 roja-zededa

qemu_thread_set_affinity(cpu->thread, NULL, cpu->cpumask); definition has been changed in the newer version. As of now, passing NULL for CPUlist but that might cause some issues. Need to change patch 15 to accommodate the new definition.

I would make it as a new patch without changing patch 15.

OhmSpectator avatar Oct 16 '24 17:10 OhmSpectator

@OhmSpectator The board I have right now doesn't have 10 cores. Would it be okay to manually perform the test on a four-core DellOptiplex? If that doesn't work we can directly kickoff the new cpupinning test desgined by vignesh.

roja-zededa avatar Oct 16 '24 18:10 roja-zededa

You can also run EVE in QEMU with a given amount of VCPUs =)

OhmSpectator avatar Oct 16 '24 19:10 OhmSpectator

@OhmSpectator Tested this PR with tiny VM (1vcpu). Here are the results. Added a VM (no pin), 3 more VMs with pinning enabled. 79e8a31a-712c-4d1a-be9d-1f41a6da1300:~# for pid in $(pgrep qemu) ; do echo "======= QEMU $pid threads: =======" ; for spid in $(ps -T -o spid= -p "$pid" ) ; do taskset -pc "$spid" ; done ;

done ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0-3 pid 32094's current affinity list: 0-3 pid 32098's current affinity list: 0-3 pid 32099's current affinity list: 0-3 pid 32101's current affinity list: 0-3 pid 32142's current affinity list: 0-3 pid 32211's current affinity list: 0-3 pid 32444's current affinity list: 0-3 pid 32445's current affinity list: 0-3 pid 32446's current affinity list: 0-3 pid 32447's current affinity list: 0-3 pid 32448's current affinity list: 0-3 pid 32449's current affinity list: 0-3


79e8a31a-712c-4d1a-be9d-1f41a6da1300:~# for pid in $(pgrep qemu) ; do echo "======= QEMU $pid threads: =======" ; for spid in $(ps -T -o spid= -p "$pid" ) ; do taskset -pc "$spid" ; done ;

done ======= QEMU 578 threads: ======= pid 578's current affinity list: 1 pid 591's current affinity list: 1 pid 592's current affinity list: 1 pid 596's current affinity list: 1 pid 597's current affinity list: 1 pid 599's current affinity list: 1 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,2,3 pid 32094's current affinity list: 0,2,3 pid 32098's current affinity list: 0,2,3 pid 32099's current affinity list: 0,2,3 pid 344's current affinity list: 0,2,3


79e8a31a-712c-4d1a-be9d-1f41a6da1300:~# for pid in $(pgrep qemu) ; do echo "======= QEMU $pid threads: =======" ; for spid in $(ps -T -o spid= -p "$pid" ) ; do taskset -pc "$spid" ; done ;

done ======= QEMU 578 threads: ======= pid 578's current affinity list: 1 pid 591's current affinity list: 1 pid 592's current affinity list: 1 pid 596's current affinity list: 1 pid 597's current affinity list: 1 pid 979's current affinity list: 0-3 pid 1954's current affinity list: 1 ======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 1702's current affinity list: 2 pid 1813's current affinity list: 0-3 pid 1814's current affinity list: 2 pid 1815's current affinity list: 2 pid 1816's current affinity list: 2 pid 1817's current affinity list: 2 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,3 pid 32094's current affinity list: 0,3 pid 32098's current affinity list: 0,3 pid 32099's current affinity list: 0,3 pid 1163's current affinity list: 0,3


======= QEMU 578 threads: ======= pid 578's current affinity list: 1 pid 591's current affinity list: 1 pid 592's current affinity list: 1 pid 596's current affinity list: 1 pid 597's current affinity list: 1 pid 979's current affinity list: 0-3 pid 3507's current affinity list: 1 ======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 2258's current affinity list: 0-3 pid 3506's current affinity list: 2 ======= QEMU 3163 threads: ======= pid 3163's current affinity list: 3 pid 3178's current affinity list: 3 pid 3185's current affinity list: 3 pid 3193's current affinity list: 3 pid 3194's current affinity list: 3 pid 3201's current affinity list: 3 pid 3251's current affinity list: 0-3 pid 3503's current affinity list: 0-3 pid 3504's current affinity list: 0-3 pid 3505's current affinity list: 0-3 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0 pid 32094's current affinity list: 0 pid 32098's current affinity list: 0 pid 32099's current affinity list: 0 pid 2881's current affinity list: 0


Removing Pinned VMs one-by-one


======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 2258's current affinity list: 0-3 pid 4727's current affinity list: 2 pid 4728's current affinity list: 2 pid 4729's current affinity list: 2 pid 4762's current affinity list: 0-3 ======= QEMU 3163 threads: ======= pid 3163's current affinity list: 3 pid 3178's current affinity list: 3 pid 3185's current affinity list: 3 pid 3193's current affinity list: 3 pid 3194's current affinity list: 3 pid 3593's current affinity list: 0-3 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,1 pid 32094's current affinity list: 0,1 pid 32098's current affinity list: 0,1 pid 32099's current affinity list: 0,1 pid 2881's current affinity list: 0,1


======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 2258's current affinity list: 0-3 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,1 pid 32094's current affinity list: 0,1 pid 32098's current affinity list: 0,1 pid 32099's current affinity list: 0,1 pid 2881's current affinity list: 0,1


79e8a31a-712c-4d1a-be9d-1f41a6da1300:~# for pid in $(pgrep qemu) ; do echo "======= QEMU $pid threads: =======" ; for spid in $(ps -T -o spid= -p "$pid" ) ; do taskset -pc "$spid" ; done ;

done ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0-3 pid 32094's current affinity list: 0-3 pid 32098's current affinity list: 0-3 pid 32099's current affinity list: 0-3 pid 5918's current affinity list: 0-3 pid 6455's current affinity list: 0-3

roja-zededa avatar Oct 23 '24 20:10 roja-zededa

@roja-zededa , in the commit title "pillar: Adopt pillar to use native cpupinning", I guess you meant: "pillar: Adapt pillar to use..."

rene avatar Oct 25 '24 08:10 rene

======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 2258's current affinity list: 0-3 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,1 pid 32094's current affinity list: 0,1 pid 32098's current affinity list: 0,1 pid 32099's current affinity list: 0,1 pid 2881's current affinity list: 0,1

This looks strange as I would expect

> ======= QEMU 1669 threads: =======
> pid 1669's current affinity list: 2
> pid 1682's current affinity list: 2
> pid 1683's current affinity list: 2
> pid 1688's current affinity list: 2
> pid 1689's current affinity list: 2
> pid 2258's current affinity list: 0-3
> ======= QEMU 32076 threads: =======
> pid 32076's current affinity list: 0,1,3
> pid 32094's current affinity list: 0,1,3
> pid 32098's current affinity list: 0,1,3
> pid 32099's current affinity list: 0,1,3
> pid 2881's current affinity list: 0,1,3

But maybe the CPUs were redistributed a little bit later.

OhmSpectator avatar Oct 30 '24 10:10 OhmSpectator

======= QEMU 1669 threads: ======= pid 1669's current affinity list: 2 pid 1682's current affinity list: 2 pid 1683's current affinity list: 2 pid 1688's current affinity list: 2 pid 1689's current affinity list: 2 pid 2258's current affinity list: 0-3 ======= QEMU 32076 threads: ======= pid 32076's current affinity list: 0,1 pid 32094's current affinity list: 0,1 pid 32098's current affinity list: 0,1 pid 32099's current affinity list: 0,1 pid 2881's current affinity list: 0,1

This looks strange as I would expect

> ======= QEMU 1669 threads: =======
> pid 1669's current affinity list: 2
> pid 1682's current affinity list: 2
> pid 1683's current affinity list: 2
> pid 1688's current affinity list: 2
> pid 1689's current affinity list: 2
> pid 2258's current affinity list: 0-3
> ======= QEMU 32076 threads: =======
> pid 32076's current affinity list: 0,1,3
> pid 32094's current affinity list: 0,1,3
> pid 32098's current affinity list: 0,1,3
> pid 32099's current affinity list: 0,1,3
> pid 2881's current affinity list: 0,1,3

But maybe the CPUs were redistributed a little bit later.

The deleted cpus are being reclaimed. Here is the trace. QEMU 22745 - No CPU pin QEMU 23728 - 1 CPU pinned QEMU 24930 - 1CPU Pinned

======= QEMU 22745 threads: ======= pid 22745's current affinity list: 0,3 pid 22774's current affinity list: 0,3 pid 22778's current affinity list: 0,3 pid 22779's current affinity list: 0,3 pid 24456's current affinity list: 0,3 pid 25204's current affinity list: 0,3 pid 25216's current affinity list: 0-3 ======= QEMU 23728 threads: ======= pid 23728's current affinity list: 1 pid 23742's current affinity list: 1 pid 23743's current affinity list: 1 pid 23747's current affinity list: 1 pid 23748's current affinity list: 1 pid 24056's current affinity list: 0-3 ======= QEMU 24930 threads: ======= pid 24930's current affinity list: 2 pid 24944's current affinity list: 2 pid 24945's current affinity list: 2 pid 24949's current affinity list: 2 pid 24950's current affinity list: 2 pid 24952's current affinity list: 2 pid 25024's current affinity list: 0-3 pid 25025's current affinity list: 2 pid 25026's current affinity list: 2 pid 25027's current affinity list: 2

Removal:

Deletea VM

======= QEMU 22745 threads: ======= pid 22745's current affinity list: 0,1,3 pid 22774's current affinity list: 0,1,3 pid 22778's current affinity list: 0,1,3 pid 22779's current affinity list: 0,1,3 pid 27625's current affinity list: 0-3 ======= QEMU 24930 threads: ======= pid 24930's current affinity list: 2 pid 24944's current affinity list: 2 pid 24945's current affinity list: 2 pid 24949's current affinity list: 2 pid 24950's current affinity list: 2 pid 25420's current affinity list: 0-3 pid 27674's current affinity list: 2

roja-zededa avatar Oct 30 '24 20:10 roja-zededa

@OhmSpectator @rene Covered all the test cases in this document. The results look good to me. https://docs.google.com/document/d/1qoCq7SPk6UR2LCb6APVbmLLgo7W7wONJUncTrXCbPeM/edit?tab=t.0#heading=h.p1hte28jbxjx Please let me know if you have more questions.

roja-zededa avatar Oct 31 '24 18:10 roja-zededa

@roja-zededa, please address the rest of my comments here: https://github.com/lf-edge/eve/pull/4212#pullrequestreview-2404369311

  1. I want to understand what are the the threads that run only on a given subset of CPUs and what are the threads that run on all CPUs. To understand it, please check how the thread contexts work. We use tc1, but we should understand which exactly threads are affected by it.

  2. Check not only the CPU affinities but the CPUset in cgroup. I have already provided a snippet for that:

for pid in $(pgrep qemu); do 
    echo "======= QEMU $pid threads: ======="; 
    for spid in $(ps -T -o spid= -p "$pid"); do 
        echo -n "Thread $spid affinity: "; taskset -pc "$spid"; 
        cgpath=$(grep -m1 cpuset /proc/$spid/cgroup | cut -d: -f3)
        echo "Cgroup cpuset: $(cat /sys/fs/cgroup/cpuset${cgpath}/cpuset.cpus)"
    done; 
done

OhmSpectator avatar Nov 01 '24 09:11 OhmSpectator

@roja-zededa, please address the rest of my comments here: #4212 (review)

  1. I want to understand what are the the threads that run only on a given subset of CPUs and what are the threads that run on all CPUs. To understand it, please check how the thread contexts work. We use tc1, but we should understand which exactly threads are affected by it.
  2. Check not only the CPU affinities but the CPUset in cgroup. I have already provided a snippet for that:
for pid in $(pgrep qemu); do 
    echo "======= QEMU $pid threads: ======="; 
    for spid in $(ps -T -o spid= -p "$pid"); do 
        echo -n "Thread $spid affinity: "; taskset -pc "$spid"; 
        cgpath=$(grep -m1 cpuset /proc/$spid/cgroup | cut -d: -f3)
        echo "Cgroup cpuset: $(cat /sys/fs/cgroup/cpuset${cgpath}/cpuset.cpus)"
    done; 
done

@OhmSpectator Updated this doc:https://docs.google.com/document/d/1qoCq7SPk6UR2LCb6APVbmLLgo7W7wONJUncTrXCbPeM/edit?tab=t.0#heading=h.p1hte28jbxjx with the CPUset group info and CPU Affinity. Please refer to it. I don't see any issue with it so far.

roja-zededa avatar Nov 04 '24 20:11 roja-zededa

@roja-zededa, please address the rest of my comments here: #4212 (review)

  1. I want to understand what are the the threads that run only on a given subset of CPUs and what are the threads that run on all CPUs. To understand it, please check how the thread contexts work. We use tc1, but we should understand which exactly threads are affected by it.
  2. Check not only the CPU affinities but the CPUset in cgroup. I have already provided a snippet for that:
for pid in $(pgrep qemu); do 
    echo "======= QEMU $pid threads: ======="; 
    for spid in $(ps -T -o spid= -p "$pid"); do 
        echo -n "Thread $spid affinity: "; taskset -pc "$spid"; 
        cgpath=$(grep -m1 cpuset /proc/$spid/cgroup | cut -d: -f3)
        echo "Cgroup cpuset: $(cat /sys/fs/cgroup/cpuset${cgpath}/cpuset.cpus)"
    done; 
done

@OhmSpectator Updated this doc:https://docs.google.com/document/d/1qoCq7SPk6UR2LCb6APVbmLLgo7W7wONJUncTrXCbPeM/edit?tab=t.0#heading=h.p1hte28jbxjx with the CPUset group info and CPU Affinity. Please refer to it. I don't see any issue with it so far.

@OhmSpectator @rene @eriknordmark Please kickoff the tests.

roja-zededa avatar Nov 05 '24 19:11 roja-zededa

@roja-zededa, thanks for the testsing! Interestingly, we enforce pinning more by setting CPUSet in a cgroup now rather than by setting affinity in QEMU. But let it be.

OhmSpectator avatar Nov 05 '24 21:11 OhmSpectator