gilectomy
gilectomy copied to clipboard
Benchmarking is currently invalid. Use division of work, not work duplication.
The benchmark is a FIB sequence, duplicating this work over many cores even with 100% efficiency will never yield any speedup.
Work division is needed to see a speedup, I propose a simple, valid benchmark:
-
Multiplication of many elements of a list, divide the list into chunks and give a chunk to each thread.
-
Alternatively use a loop to multiply simple numbers many times (
>10^9
), and divide the loop iterations among threads.
The good news is that if work duplication is currently a similar speed to single threaded code, division of work will already be faster.