alpaka
alpaka copied to clipboard
Abstraction Library for Parallel Kernel Acceleration :llama:
While working on #1977 (which adds device-side debug flags) I discovered that CUDA
Adds an overload set `zero_memcpy` that only copies a buffer if the destination device requires the buffer in a different memory space in order to access it. Otherwise, no copy...
I am trying to move my code to a newer version of Alpaka (from 0.7 to 0.9 or above) but I'm running into a compilation issue. I have code which...
I was just curious if there are any plans for half precision / fp16 support in the future. Thank you.
While reviewing the implementation of the atomic operations in SYCL, I started comparing what operations are available on CUDA, HIP and SYCL:  What operations should be supported by Alpaka...
Currently only `pow` is tested with mixed `float` and `double` arguments, in [test/unit/math/src/powMixedTypes.cpp](https://github.com/alpaka-group/alpaka/blob/develop/test/unit/math/src/powMixedTypes.cpp). All binary math functions should be tested with mixed `float`, `double` and integer arguments.
During the accessor development #1249 I needed to implement a few meta functions on the side. Since alpaka is TMP heavy, we are going to need such metaprogramming facilities regularly...
In CUDA/ROCm timing is disabled by default when [creating events](https://github.com/alpaka-group/alpaka/blob/a9f5b59da076e0371a54cb7c4158b50f116e13f5/include/alpaka/event/EventUniformCudaHipRt.hpp#L58), while in SYCL the profiling on the queue is [enabled](https://github.com/alpaka-group/alpaka/blob/a9f5b59da076e0371a54cb7c4158b50f116e13f5/include/alpaka/queue/sycl/QueueGenericSyclBase.hpp#L46). It would be nice to align all the backends to...
In CMS we developed various functionality that is general enough and might be interesting to share. Some are caching solutions we developed to improve the efficiency of the original CUDA...
Various Alpaka functions that mutate their non-`const` argument take them by non-`const` reference: - `ALPAKA_FN_HOST static auto createTaskMemset(TView& view, std::uint8_t const& byte, TExtent const& extent)` takes `view` by non-`const` reference...