Lilo Huang
Lilo Huang
This is a follow up question from https://github.com/NVIDIA/thrust/issues/1587 @allisonvacanti @senior-zero Unlike the thrust::reduce(), the thrust::reduce_by_key() results are also non-deterministic for floats, am I right? From my limited testing somehow I...
This issue is a follow up of the https://github.com/NVIDIA/cub/issues/369 The documentation of cub::GridBarrier is unclear to understand the grid size limitation which could be throttled by the SM count, block...
Hi, The simpleIPC.cu sample code describes the maximum simultaneous peers limitation for PCI-E cards. However, I couldn't find any detailed information from CUDA programming guide, is there an official documentation...
Hi oneDPL experts, The oneapi::dpl::reduce_by_key cannot produce expected output when the key elements are all zero. However, change the key elements to be one can get rid of the bug....
As we all know, https://developer.codeplay.com/products/computecpp/ce/guides/sycl-guide/debugging demonstrates how to construct a sycl::stream for printing to standard output from device code. However, I have no idea how to obtain the handler to...
## Description Cupy offers the `cupy.cuda.stream.ExternalStream` for utilizing external CUDA streams. Moreover, `cupy.cuda.get_current_stream()` will return an instance of `cupy.cuda.stream.ExternalStream` instead of `cupy.cuda.stream.Stream`, particularly when the current cuPy stream has been...
### Description Hi @leofang and all, I would like to know if cuPy provides any deterministic result guarantees (i.e., bitwise reproducible) similar to cuFFT and other NVIDIA GPU-accelerated libraries. As...