PaulGannay
PaulGannay
### Is this a duplicate? - [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cccl/issues) for this bug and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md) ### Type...
See [here](https://github.com/kokkos/kokkos/issues/8080) for a description of what we need. PR opened to allow for discussion on this subject: - 1 - Should we offer new callbacks (`begin_single` and `end_single`) for...
Currently, the only way of requesting that code executes on device is to use one of the parallel construct (parallel_for, parallel_reduce or parallel_scan), but it can happen that one needs...
Micro-benchmark to test: - PerfTest_PtrAccess.cpp: the overhead of accessing Kokkos::View through the parenthesis operator compared to the pointer returned by .data(). - MicroBench_ParallelForOverheads.cpp: the overheads of the various objects creation...
PR to merge the work of @blegouix done in https://github.com/CExA-project/ddc/pull/708. I can't update the original PR since I don't have writing rights over https://github.com/blegouix/ddc/ nor https://github.com/CExA-project/ddc.
The main goal of this benchs is to check that there is no regression regarding performances of launching a parallel_for. Results with g++-13.3.0 ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------...