fCWT
fCWT copied to clipboard
Performance drops with high thread count (32 threads)
Hi, thanks for the great work on fCWT!
I noticed that on my machine (R9 7945HX, 32 threads), setting nthreads=8 gives the best performance. Using more threads (e.g. 32) makes it slower.
Is this expected? Could performance with higher thread counts be improved?
Thanks in advance!
I think I need a bit more information. For example:
- Did you use optimization plans?
- What is the input length?
- What are the number of scales you use?
Also, with 32 threads across 16 cores, the optimal number of threads would be 16 as hyperthreading does generally do worse due to memory overhead. With 16 threads, each thread has its own core and own L1, and sometimes L2 cache.