Cabana icon indicating copy to clipboard operation
Cabana copied to clipboard

LinkedCellList::build performance enhacements

Open j8asic opened this issue 3 years ago • 2 comments

1. Extra deep copy

In LinkedCellList::build, after computing offsets, the cell counters are reset Kokkos::deep_copy( counts, 0 ) But this is an overhead for the sake later usage of Kokkos::atomic_fetch_add atomic_fetch_sub

Instead of the above, counts should not be reset, but instead it should be reused with the help of Kokkos::atomic_fetch_sub:

            int c = Kokkos::atomic_fetch_sub( &counts( cell_id ), 1 ) - 1;
            permute( offsets( cell_id ) + c ) = p;

EDIT: ignore this as I see below you need counts for BinningData

2. Always allocating memory

In unsteady simulations often construction of Views has significant impact. I advise counts, offsets, and permute to be member variables that are only resized when needed.

3. Atomic exchange

Linked lists can be built using atomic exchange [1], and maybe instead current scheme (atomic per particle + scan + atomic per particle), one can get (atomic + scan per linked list + permute per cell). But maybe this is not a great idea since the memory access would be more random.

Btw great project! I'll start making pull requests instead of issues asap when I get some time to contribute. Cheers, Josip

j8asic avatar Jan 21 '21 08:01 j8asic

@j8asic Thanks for the input! We really appreciate you checking out the code and providing feedback. I think you have some good ideas on performance here. We have a performance test that uses the linked cell list in the context of neighbor lists here: https://github.com/ECP-copa/Cabana/blob/master/core/performance_test/benchmark/md_neighbor_perf_test.cpp

We can use this to test your ideas. I think your memory allocation idea would be the easiest. Are you rebuilding the cell list every timestep or some subset of timesteps? Is this for molecular dynamics, PIC, or some other algorithm?

Thanks!

sslattery avatar Jan 28 '21 19:01 sslattery

Thanks @sslattery, I still need to read the code (which is neatly written!) to get the full idea, and I see we're doing things similarly (Kokkos as well). The main difference is I deprecated building neighbour lists - I just reiterate linked-list neighbours on interaction and sort the data once in a while (SoA). Simulating incompressible motion, neighbourhoods change each time step but not intensively, so I tried to keep simplicity and cache hits. Does Cabana offer controlling of reordering data? Also distributor is similar but with 1D slicing. I would love to see benchmarks on how the AoSoA reordering and neighbour lists play role in a larger-scale simulation. Am willing to transfer my code and contribute. Do you have any communication channels not to contaminate GH issues with discussions?

j8asic avatar Jan 29 '21 08:01 j8asic

Primary improvement already made - other ideas worth pursing at some point

streeve avatar Aug 25 '23 14:08 streeve