large core count dev machines spawn too many threads
⚠️ Issues not using this template will be systematically closed.
Describe the bug Development machines with large core count spawn too many threads.
std::thread::hardware_concurrency() may return the hardware thread count of available hardware threads. In general this is not handled in the most ideal way to prevent excessive resource consumption. This will impact debugging performance on a host machine
Sample source tree grep for std::thread::hardware_concurrency():
filament/src/details/Engine.cpp: int threadCount = (int)std::thread::hardware_concurrency() - 2;
filament/src/details/Engine.cpp: uint32_t const id = std::thread::hardware_concurrency() - 1;
third_party/tinyexr/tinyexr.h: int num_threads = std::max(1, int(std::thread::hardware_concurrency()));
third_party/tinyexr/tinyexr.h: int num_threads = std::max(1, int(std::thread::hardware_concurrency()));
third_party/basisu/encoder/basisu_comp.cpp: num_threads = basisu::maximum<uint32_t>(1, std::thread::hardware_concurrency());
third_party/basisu/encoder/basisu_frontend.cpp: max_threads = m_params.m_multithreaded ? minimum<int>(std::thread::hardware_concurrency(), cMaxCodebookCreationThreads) : 0;
third_party/basisu/encoder/basisu_frontend.cpp: max_threads = m_params.m_multithreaded ? minimum<int>(std::thread::hardware_concurrency(), cMaxCodebookCreationThreads) : 0;
third_party/basisu/basisu_tool.cpp: // We use std::thread::hardware_concurrency() as a hint to determine the default # of threads to put into a pool.
third_party/basisu/basisu_tool.cpp: num_threads = std::thread::hardware_concurrency();
third_party/meshoptimizer/gltf/basisenc.cpp: uint32_t num_threads = settings.texture_jobs == 0 ? std::thread::hardware_concurrency() : settings.texture_jobs;
On my dev box it has 32 hardware threads. That means the engine is spawning 30 threads.
To Reproduce Steps to reproduce the behavior:
- Run any filament example of a high thread count system
Expected behavior Not to impact debugging performance or consume excessive resources.
Logs If applicable, copy full logs from your console here. Please do not use screenshots of logs, copy them as text, use gist or attach an uncompressed file.
Desktop (please complete the following information):
- OS: Linux
- GPU: AMD Radeon
- Backend: Vulkan
Smartphone (please complete the following information):
- Device: [e.g. Pixel 2]
- OS: [e.g. Android Pie 9.0]
Additional context A proposed solution might be:
static inline size_t get_platform_thread_count(size_t min_required_thread_count, size_t max_useful_thread_count) {
size_t count = std::thread::hardware_concurrency()
if (count <= min_required_thread_count) {
return min_required_thread_count;
}
else if(count <= max_useful_thread_count) {
return count;
}
else {
return max_useful_thread_count;
}
}
30 threads is not that high. What problem is this causing? These threads should only be busy in small bursts.
Add in the Flutter Engine and I have 87 threads under my process. On a mobile device it will be limited to the max core count; which might be eight... Debugging is slower than it should be.
And what's the CPU usage of those threads?
Once it gets running the majority are blocked waiting for a message.
I patched my tree in Filament and the Flutter engine to go from 87 to a more reasonable 32 threads. Flutter Engine has the same issue.
/squanderer-in-chief
@jwinarske can you please explain the logic in your sample code, I don't understand what's going on:
- if the value returned is even, then we always return
req. Why? e.g. if req is 12 andstd::thread::hardware_concurrencyreturns 4, then we would return 12??? - you say "if divisible by 1", well... everything is always divisible by 1, so I don't understand the comment.
- you check for (max_thread % 1) and then for that again inside the
if, why?
In the comment you talk about a "scaled value", what does that mean?
why should we treat odd/even differently from hardware_concurrency?
@pixelflinger Hi there. Sorry that was half baked; related to something else. I just updated my proposed fix