Matt Norman
Matt Norman
@oksanaguba @xyuan @brhillman @mt5555 @rljacob @whannah1 I'm creating a github issue for this failure on next. It manifests as a large allocation request to YAKL of a 60 GB variable,...
Add Kokkos backend similar to CUDA, HIP, and SYCL using Kokkos C-style allocators and other functionality for parallel_for.
On current main branch, hash: d29e739f446cb9bcf3a12899cbabe754f471f58b ```bash qsub -I -t 30 -n 1 -q florentia_debug ``` ```bash source jlse_gpu_O3.sh make -j make test ``` ```bash [ac.normanmr@florentia02:~/YAKL/unit/build/machines/jlse] >:O ./Streams/Streams Running on...
Using current main, hash: d706e2f44fb8f5651cb2f6f3748e6fd3261a70ae ```bash qsub -I -t 120 -n 1 -q arcticus ``` ```bash source source jlse_gpu_O3.sh make -j make test ``` ```bash [ac.normanmr@arcticus09:~/YAKL/unit/build/machines/jlse] >:O ./CArray/CArray terminate called...
Create inner level parallel reductions using CUB and hipCUB inside the `YAKL_reductions.h` header, and create simple `intrinsics` functions for `sum`, `maxval`, `minval`, and `product`. Whenever inner-level loop size exceeds the...