Hugh Delaney
Hugh Delaney
This PR allows CXX stdlib funcs to be used for NVPTX backend. See https://github.com/intel/llvm/discussions/6379 llvm-test-suite test: https://github.com/intel/llvm-test-suite/pull/1112 It also adds the compiler flag `"-fbundle-no-offload-arch"`, which allows device code bundles to...
This should have been defined with always inline to avoid multiple symbols in multi object compilations
Instead of throwing an error, it would be convenient if bfloat16 conversions could be done on host as well as device. cc @JackAKirk
The current structure of syclacademy presents the buffer/accessor model before USM. This chapter makes the assumption that the programmer knows the difference between device and host memory, as well as...
Adding new exercise for matrix transpose, a simple intro to coalesced global mem accesses as well as local memory. Let me know if you think this should go somewhere else.
Since https://github.com/oneapi-src/unified-runtime/pull/999 it is no longer valid to get the native context from the SYCL context on a multi GPU system. The get native func for contexts has been deprecated...
If `CUDA_ERROR_FUNC`, `CUSOLVER_ERROR_FUNC` etc is called and the result `!= CUDA_SUCCESS`, a `cuda_error` will be thrown and any allocated pointers will not be deallocated, causing a memory leak. We should...
https://github.com/oneapi-src/unified-runtime/pull/1326
Adds a new preference for range rounding, force, such that if the compile flag is used, only the range rounded parallel_for kernel will be generated. This can make binaries smaller...