tuned
tuned copied to clipboard
cpu: add disable_thread_siblings option
This disables the thread siblings (a.k.a Hyperthreads) of the selected CPUs, which prevents any impact due to resources shared between the threads.
The selected CPUs will stay online. If the list contains CPUs and their sibling threads, then those siblings will also stay online.
Note that this interacts with the scheduler plugin, because it changes available CPUs that may appear in affinity masks. When using this, [cpu] should appear after [scheduler] in the profile.
Signed-off-by: Adriaan Schmidt [email protected]
There are scenarios where disabling thread siblings (hyper-threads) is recommended, and we use it in some more aggressively-tuned use cases:
- https://01.org/packet-processing/blogs/nsundar/2018/nfv-i-host-configuration-low-latency
- https://rigtorp.se/low-latency-guide/
@adriaan42 could it wait for the 2.17.0 release? I would like to tag 2.16.0 release today or tmrw (we are quite behind the schedule).
@adriaan42 could it wait for the 2.17.0 release? I would like to tag 2.16.0 release today or tmrw (we are quite behind the schedule).
Yes, sure. There's no hurry. I have a long weekend now, will post an update next week.
Thank you for the PR, @adriaan42 ! This is definitely a useful functionality to add to TuneD
. Could you please provide some documentation on how to use this feature to the commit message/description of this PR along with an example? From my own experience, adding documentation helps you think more about the feature you're adding and typically often improves the implementation itself too. Here is an example of what we're working on right now for some inspiration. Your docs doesn't have to be asciidoc, just plain text.
@jmencak thanks! Yes, I will provide some documentation.
While thinking about the feature some more, a question on the implementation occured: in my current proposal, all you pass is the CPUs whose siblings should be taken offline, so you could say disable_thread_siblings=${isolated_cores}
, and at the profile level you never need to know which cores will actually be deactivated.
But offlining CPUs affects other code that deals with affinities, and I see some interactions with the scheduler plugin, where the ordering of setting affinities vs. offlining CPUs matters.
An alternative would be to expose the thread sibling logic to the profile, e.g. providing something like a thread_siblings_of(cpu_set)
function. This way the profiles could calculate CPU sets explicitly, and remove the deactivated cores from the affinity sets (currently represented by the profile variable non_isolated_cores
). After the profile has calculated the set of cores to disable, it would then do something like disable_cpus=${cores_to_offline}
(where disable_cpus
no longer needs any thread sibling logic). This would mean more code, and some effort in the profiles using the feature, but may be the cleaner approach.
What do you think?
While thinking about the feature some more, a question on the implementation occured: in my current proposal, all you pass is the CPUs whose siblings should be taken offline, so you could say
disable_thread_siblings=${isolated_cores}
, and at the profile level you never need to know which cores will actually be deactivated.
Sounds good. Traditionally ${isolated_cores}
were used in the sense of not housekeeping cores. I.e. the cores for the latency-sensitive applications to run on. This PR will enable better granularity for selectively disabling HT for a subset of selected cores only, rather than using the kernel command-line option noht
.
But offlining CPUs affects other code that deals with affinities, and I see some interactions with the scheduler plugin, where the ordering of setting affinities vs. offlining CPUs matters.
Aah, possibly.
An alternative would be to expose the thread sibling logic to the profile, e.g. providing something like a
thread_siblings_of(cpu_set)
function. This way the profiles could calculate CPU sets explicitly, and remove the deactivated cores from the affinity sets (currently represented by the profile variablenon_isolated_cores
). After the profile has calculated the set of cores to disable, it would then do something likedisable_cpus=${cores_to_offline}
(wheredisable_cpus
no longer needs any thread sibling logic). This would mean more code, and some effort in the profiles using the feature, but may be the cleaner approach.
Yes, this sounds "cleaner" and probably even more flexible than the existing solution, so what you suggest sounds reasonable to me. Perhaps we need a proof-of-concept, a few examples (docs) and a bit of testing to answer whether this is really the way to go? Other folk (especially TuneD maintainers) might have an opinion on this already too?
This pull request introduces 2 alerts when merging a77422c7fa5513a5fab00f2dc773d887197d77df into 7ba8d5d694267783466d1700ca4c1a09a5610aec - view on LGTM.com
new alerts:
- 2 for Unused import
I've just pushed version 2 of my proposal, which splits functionality into two parts:
- the function
cpulist_thread_siblings
is used to explicitly calculate the CPUs that should be disabled - the option
disable_cpus
of the cpu plugin disables them (could rename this todisable_cores
which may be a more consistent name...)
So, in a profile, we can now:
[variables]
isolated = <insert here>
siblings_of_isolated = ${f:cpulist_thread_siblings:${isolated}}
isolated_or_disabled = ${isolated},${siblings_of_isolated}
non_isolated = ${f:cpulist_invert:${isolated_or_disabled}}
[bootloader]
cmdline = isolcpus=${isolated} rcu_nocbs=${isolated} irqaffinity=${non_isolated}
# or even ${isolated_or_disabled} for isolcpus and rcu_nocbs, not sure which is better
[cpu]
disable_cpus = ${siblings_of_isolated}
[scheduler]
isolated_cores = ${isolated}
Or construct CPU sets differently if we want some core isolated with their siblings disabled, and some with siblings enabled, etc.
Many thanks for the updated PR! I've tested the current version and it seems to work fine. Swapping the [cpu]
and [scheduler]
sections also worked fine. Even though disable_cores
would be consistent with isolated_cores
, I personally wouldn't rename disable_cpus
to disable_cores
. isolated_cores
was an unfortunate past decision which should in fact have been isolated_cpus
.
As for the code review, leaving this up to the TuneD
maintainers.
@yarda could you please have a look at the updated implementation?
@yarda , I believe it is now a good time to revisit this as 2.20 is out, WDYT? It's been some time... If you want, I can give it some testing from my side.
I think we need to extend the ADOC documentation in the class CPULatencyPlugin by the new option.
I am afraid this PR breaks the current logic a bit. There can be multiple instances of the cpu plugin, thus I think such changes should be made only in the "first instance" of the cpu plugin. Moreover disablement of the CPUs may affect other cpu plugin instances, thus reenumeration or hotplug/deplug event initiation will be probably required.
Maybe we could just add the "offline" or similar option to the cpu plugin? E.g.:
[cpus_online]
type=cpu
devices=${f:cpulist2devs:-0-3}
...
[cpus_offline]
type=cpu
devices=${f:cpulist2devs:-4-7}
offline=true
I think this would be more compatible with the current logic and you could also add the function for calculation of the siblings (e.g. cpulist_thread_siblings(0-3) to get siblings of CPUs 0, 1, 2, 3) and disable them all just in the one cpu plugin instance, e.g. called [cpus_offline].
I think we need to extend the ADOC documentation in the class CPULatencyPlugin by the new option.
I am afraid this PR breaks the current logic a bit. There can be multiple instances of the cpu plugin, thus I think such changes should be made only in the "first instance" of the cpu plugin. Moreover disablement of the CPUs may affect other cpu plugin instances, thus reenumeration or hotplug/deplug event initiation will be probably required.
Actually, the current implementation already ensures the disabling only runs in the first instance (it happens after the if not instance._first_instance: return
in _instance_apply_static
). To reenumerate or trigger hotplug/deplug I'd need some help...
Maybe we could just add the "offline" or similar option to the cpu plugin? E.g.:
[cpus_online] type=cpu devices=${f:cpulist2devs:-0-3} ... [cpus_offline] type=cpu devices=${f:cpulist2devs:-4-7} offline=true
I think this would be more compatible with the current logic and you could also add the function for calculation of the siblings (e.g. cpulist_thread_siblings(0-3) to get siblings of CPUs 0, 1, 2, 3) and disable them all just in the one cpu plugin instance, e.g. called [cpus_offline].
Yes, I've just played around with it, and it can be implemented like this. Just one thing that makes this tricky to use: The cpus_online
instance needs to come first, because things like latency settings are only applied for the first instance. That means that we'd need to explicitly calculate the online-cpuset. We can't just define the CPUs we want to offline in cpus_offline
, and let the cpus_online
instance "catch" the remaining devices. But that's ok for me.
@yarda just let me know which way you'd like the implementation, then I can send an update (with documentation).
@adriaan42 I like the 'offline' option more. It's more the TuneD way.
It's also possible to explicitly define order of the instances by using the 'priority' option, otherwise it should be processed the way it's defined, e.g.:
[cpu_offline]
type=cpu
devices=....
priority=10
[cpu_online]
type=cpu
devices=...
priority=5
And cpu_online will be always processes as the first.
There is now also the 'pm_qos_resume_latency_us' option which works per CPU, e.g.:
[cpu_online]
type=cpu
devices=...
pm_qos_resume_latency_us=100
[cpu_offline]
type=cpu
devices=....
pm_qos_resume_latency_us=150
@yarda I've just pushed an update. In the code, I'm doing a whole dance with _storage_get
and _storage_set
to track which CPUs need to be re-enabled. Is this actually the correct way to do it?
Rebased on current master, and also fixed a bug in _instance_verify_static
.
@yarda I'm still seeing a potential issue with the usability due to the "first instance" vs "implicit assignment of devices":
If I have a system with cpus 0-3
, and want to take cpu3
offline, I'd want to:
[cpu_online]
type=cpu
[cpu_offline]
type=cpu
devices=${f:cpulist2devs:3}
offline=true
But [cpu_online]
will now claim all devices:
2023-09-28 09:24:56,875 DEBUG tuned.units.manager: creating 'cpu_online' (cpu)
2023-09-28 09:24:56,875 DEBUG tuned.units.manager: creating 'cpu_offline' (cpu)
2023-09-28 09:24:56,876 DEBUG tuned.plugins.base: assigning devices to instance cpu_online
2023-09-28 09:24:56,877 INFO tuned.plugins.base: instance cpu: assigning devices cpu2, cpu0, cpu3, cpu1
2023-09-28 09:24:56,877 DEBUG tuned.plugins.base: initializing instance cpu_online (cpu)
[...]
2023-09-28 09:24:56,881 DEBUG tuned.plugins.base: assigning devices to instance cpu_offline
2023-09-28 09:24:56,881 WARNING tuned.plugins.base: instance cpu_offline: no matching devices available
2023-09-28 09:24:56,881 DEBUG tuned.plugins.base: initializing instance cpu_offline (cpu)
2023-09-28 09:24:56,881 INFO tuned.plugins.plugin_cpu: Latency settings from non-first CPU plugin instance 'cpu_offline' will be ignored.
Changing the order (either in the file or with priorities) results in the correct device assignements, but the wrong "first instance":
2023-09-28 09:27:51,475 DEBUG tuned.units.manager: creating 'cpu_offline' (cpu)
2023-09-28 09:27:51,475 DEBUG tuned.units.manager: creating 'cpu_online' (cpu)
2023-09-28 09:27:51,476 DEBUG tuned.plugins.base: assigning devices to instance cpu_offline
2023-09-28 09:27:51,477 INFO tuned.plugins.base: instance cpu_offline (cpu): assigning devices cpu3
2023-09-28 09:27:51,477 DEBUG tuned.plugins.base: initializing instance cpu_offline (cpu)
[...]
2023-09-28 09:27:51,480 DEBUG tuned.plugins.base: assigning devices to instance cpu_online
2023-09-28 09:27:51,480 INFO tuned.plugins.base: instance cpu_online (cpu): assigning devices cpu1, cpu0, cpu2
2023-09-28 09:27:51,481 DEBUG tuned.plugins.base: initializing instance cpu_online (cpu)
2023-09-28 09:27:51,481 INFO tuned.plugins.plugin_cpu: Latency settings from non-first CPU plugin instance 'cpu_online' will be ignored.
To make it work a user needs to explicitly specify also the online cpus:
[cpu_online]
type=cpu
devices=${f:cpulist2devs:0-2}
[cpu_offline]
type=cpu
devices=${f:cpulist2devs:3}
offline=true
Side-question: is this behavior in the device logic (an instance without explicit devices=
setting will claim all available devices, even those devices that are explicitly requested by another/later instance) intended? I would somehow expect any explicit devices=
specification to have priority over the catch-all when no devices are specified, regardless of instance priority.