Tracer: Multithreading
Opening early to have somewhere to put markdown tables. Not ready for review/merge/whatever.
Timings for NORNE_ATW2013 [5 runs]
| Revision | tracerAdvance | tracerAssemble | tracerPost | tracerSolve | tracerUpdate | Pre/post step |
|---|---|---|---|---|---|---|
| master | 33.19 +- 0.32 | 26.80 +- 0.30 | 0.83 +- 0.00 | 5.45 +- 0.00 | 0.99 +- 0.00 | 55.19 +- 0.54 |
| tracer_mt_solve | 31.44 +- 0.04 | 26.91 +- 0.04 | 0.79 +- 0.00 | 3.63 +- 0.00 | 0.99 +- 0.00 | 54.07 +- 0.03 |
| tracer_mt_assemble | 27.85 +- 1.25 | 19.17 +- 1.32 | 2.07 +- 0.01 | 6.48 +- 0.00 | 1.00 +- 0.00 | 50.24 +- 1.24 |
| tracer_mt_update | 32.70 +- 0.12 | 26.39 +- 0.13 | 0.82 +- 0.00 | 5.39 +- 0.00 | 1.13 +- 0.00 | 54.96 +- 0.13 |
| tracer_mt | 24.62 +- 0.40 | 18.80 +- 0.33 | 1.15 +- 0.00 | 4.55 +- 0.00 | 1.10 +- 0.00 | 46.68 +- 0.48 |
Not ready for review/merge/whatever.
Understood. The early performance improvements are nevertheless encouraging.
Understood. The early performance improvements are nevertheless encouraging.
At least for tracerAssemble. Wonder why the update part (although small) degraded, and why seemingly the mt for assemble has a negative effect on solve. All has been run with max 2 OpenMP threads?
don't look at the noise....
it's likely because things are not compute dominated but bandwidth dominated and whatever else i'm doing on my pc affects that even though the computations have their own cores. in particular update is basically shuffling data around, no computations going on there.
The alternative implementation in #6206 was merged into the master branch moments ago. Is there more work coming to leverage that PR?
I have some other stuff but it needs rebasing anyways so I'll reopen with a more on-context PR if I ever get to finish it.