amrex icon indicating copy to clipboard operation
amrex copied to clipboard

Tests/Vectorization Scaling

Open GumpXiaoli opened this issue 6 years ago • 3 comments

When i play with the case in Tests/Vectorization, the computation time will double or more if the mpi ranks increase from 1 to 10. It is a little confusing since there is no data communication in this case, and the work of each core is the same.

GumpXiaoli avatar Oct 08 '19 03:10 GumpXiaoli

This test isn't setup to do any domain decomposition or parallelization - it's based on FArrayBox, not MultiFab. If you run it on more MPI tasks, every task is duplicating all the work, and contending for the same resources. I'd expect a slowdown in that case.

atmyers avatar Oct 08 '19 13:10 atmyers

Since all the MPI task is doing the same work, I expect the same time or a little more time since there may be some reduction in cache hit. However, I find the computation time increase to double or more. This is what confused me.

GumpXiaoli avatar Oct 09 '19 01:10 GumpXiaoli

The cache effects won't necessarily be small, though, if these kernels are spending a significant amount of time streaming data from main memory. In fact, if I reduce the size of the problem to 15^3, then (on my particular processor) I get the same runtime for (1, 2, 4) MPI tasks, as you would expect based on the compute work alone.

atmyers avatar Oct 09 '19 03:10 atmyers