fCWT icon indicating copy to clipboard operation
fCWT copied to clipboard

Performance drops with high thread count (32 threads)

Open cxm7899 opened this issue 6 months ago • 1 comments

Hi, thanks for the great work on fCWT!

I noticed that on my machine (R9 7945HX, 32 threads), setting nthreads=8 gives the best performance. Using more threads (e.g. 32) makes it slower.

Is this expected? Could performance with higher thread counts be improved?

Thanks in advance!

cxm7899 avatar May 27 '25 10:05 cxm7899

I think I need a bit more information. For example:

  • Did you use optimization plans?
  • What is the input length?
  • What are the number of scales you use?

Also, with 32 threads across 16 cores, the optimal number of threads would be 16 as hyperthreading does generally do worse due to memory overhead. With 16 threads, each thread has its own core and own L1, and sometimes L2 cache.

fastlib avatar May 30 '25 16:05 fastlib