gtsam
gtsam copied to clipboard
TimeTBB slower with more threads
Running the example TimeTBB I get the following results (faster using 1 thread). I'm currently looking into why but any insight in the meantime would be greatly appreciated.
numberOfProblems = 1000000 problemSize = 4 With 1 threads: Without memory allocation, grain size = 1, time = 0.195355 Without memory allocation, grain size = 10, time = 0.194243 Without memory allocation, grain size = 100, time = 0.194409 Without memory allocation, grain size = 1000, time = 0.196936 With memory allocation, grain size = 1, time = 0.289234 With memory allocation, grain size = 10, time = 0.295508 With memory allocation, grain size = 100, time = 0.294618 With memory allocation, grain size = 1000, time = 0.29145
With 4 threads: Without memory allocation, grain size = 1, time = 5.02581 Without memory allocation, grain size = 10, time = 4.9835 Without memory allocation, grain size = 100, time = 4.74276 Without memory allocation, grain size = 1000, time = 5.06713 With memory allocation, grain size = 1, time = 4.6808 With memory allocation, grain size = 10, time = 4.73614 With memory allocation, grain size = 100, time = 4.77174 With memory allocation, grain size = 1000, time = 4.75051
With 8 threads: Without memory allocation, grain size = 1, time = 4.00496 Without memory allocation, grain size = 10, time = 4.06559 Without memory allocation, grain size = 100, time = 4.06971 Without memory allocation, grain size = 1000, time = 4.06233 With memory allocation, grain size = 1, time = 4.65368 With memory allocation, grain size = 10, time = 4.6617 With memory allocation, grain size = 100, time = 4.65855 With memory allocation, grain size = 1000, time = 4.65979
Summary of results: 4 threads, without allocation, grain size = 1, speedup = 0.0388704 4 threads, without allocation, grain size = 10, speedup = 0.0389772 4 threads, without allocation, grain size = 100, speedup = 0.0409907 4 threads, without allocation, grain size = 1000, speedup = 0.0388654 4 threads, with allocation, grain size = 1, speedup = 0.0617917 4 threads, with allocation, grain size = 10, speedup = 0.0623943 4 threads, with allocation, grain size = 100, speedup = 0.0617423 4 threads, with allocation, grain size = 1000, speedup = 0.0613514 8 threads, without allocation, grain size = 1, speedup = 0.0487782 8 threads, without allocation, grain size = 10, speedup = 0.0477773 8 threads, without allocation, grain size = 100, speedup = 0.0477697 8 threads, without allocation, grain size = 1000, speedup = 0.0484786 8 threads, with allocation, grain size = 1, speedup = 0.0621517 8 threads, with allocation, grain size = 10, speedup = 0.0633907 8 threads, with allocation, grain size = 100, speedup = 0.0632425 8 threads, with allocation, grain size = 1000, speedup = 0.0625457