Alexander Sinn

Results 23 comments of Alexander Sinn

> @maikel showed us a very cool trick. https://cuda.godbolt.org/z/edxEMY7YG Here is a N-dimensional version of that, it even compiles with gcc7.5. I am still unsure about that NVCC/NVHPC redefinition problem,...

https://github.com/ECP-WarpX/WarpX/pull/3399#issuecomment-1247354064

Currently it should break after a match because of boolean short circuiting

As much as I like fold expressions, I made a second implementation using tuples and recursion. Notably this solves the capture problem as now expensive functionals can be used as...

That should be ok as long as there are make_tuple() and tuple_cat() equivalents

It does indeed work, but it is a bit tedious to generate the variants NVCC: https://cuda.godbolt.org/z/h14zEz4zx GCC: https://cuda.godbolt.org/z/a64aKvdh8

Looking at a Nsight profile I made a while ago for HiPACE++ I noticed two things: - Properly synchronized function/profiler names are available under Processes/[…] hipace/CUDA HW …/99.7% Context 1/82.4%...

In my case there isn’t an allocation of the size of the box, I need to count how many particles need to be initialized. If the additional number of registers...

I looked at a few thigs in Compiler Explorer and it seems the main reason using a 64 bit icell would be slower and use more registers is the 64...

This can already be done by setting the templated allocator to `amrex::PolymorphicArenaAllocator`. The question is if this should be the default or maybe even the only allocator? Currently with `amrex::PolymorphicArenaAllocator`...