Lilo Huang

Results 7 issues of Lilo Huang

This is a follow up question from https://github.com/NVIDIA/thrust/issues/1587 @allisonvacanti @senior-zero Unlike the thrust::reduce(), the thrust::reduce_by_key() results are also non-deterministic for floats, am I right? From my limited testing somehow I...

type: bug: functional
only: docs
P1: should have

This issue is a follow up of the https://github.com/NVIDIA/cub/issues/369 The documentation of cub::GridBarrier is unclear to understand the grid size limitation which could be throttled by the SM count, block...

type: enhancement
P2: nice to have
area: docs

Hi, The simpleIPC.cu sample code describes the maximum simultaneous peers limitation for PCI-E cards. However, I couldn't find any detailed information from CUDA programming guide, is there an official documentation...

Hi oneDPL experts, The oneapi::dpl::reduce_by_key cannot produce expected output when the key elements are all zero. However, change the key elements to be one can get rid of the bug....

bug

As we all know, https://developer.codeplay.com/products/computecpp/ce/guides/sycl-guide/debugging demonstrates how to construct a sycl::stream for printing to standard output from device code. However, I have no idea how to obtain the handler to...

question

## Description Cupy offers the `cupy.cuda.stream.ExternalStream` for utilizing external CUDA streams. Moreover, `cupy.cuda.get_current_stream()` will return an instance of `cupy.cuda.stream.ExternalStream` instead of `cupy.cuda.stream.Stream`, particularly when the current cuPy stream has been...

Python
non-breaking
improvement

### Description Hi @leofang and all, I would like to know if cuPy provides any deterministic result guarantees (i.e., bitwise reproducible) similar to cuFFT and other NVIDIA GPU-accelerated libraries. As...

cat:enhancement
prio:medium