Elias Stehle comments

Results 34 comments of


                                            Elias Stehle

Dispatch mechanism may break when any two libraries that use CUB and/thrust have been compiled for different set of GPU architectures

> TL;DR: There is something extremely odd going on here that I don't understand and just making the kernel `static` does _not_ fix the issue. Thanks for the reproducer and...

Adds DeviceBatchMemcpy algorithm and tests

Thanks for the feedback and the preliminary evaluation, @senior-zero 👍 Fundamentally, our ideas are quite similar. You do a three-way partition on all the problems. I proposed to have a...

Adds DeviceBatchMemcpy algorithm and tests

I'm currently gathering results of a few more benchmarks that hopefully will help us make an informed decision about which of the scheduling mechanisms to pursue (preliminary three-way partition vs....

Adds DeviceBatchMemcpy algorithm and tests

So I ran the first batch of benchmarks. I'll add more throughout the week. ## Methodology - Benchmarks ran on V100 - We allocate two large buffers on device memory:...

Adds DeviceBatchMemcpy algorithm and tests

Sorry for the wait. I did another clean up pass over the code of this PR. > I've long wanted a [`cuda::memcpy`](https://github.com/NVIDIA/cccl/issues/944) that would handle runtime determined alignment as well...

Adds DeviceBatchMemcpy algorithm and tests

> We don't expose CG in the CUB APIs, this would require some more discussion before we added anything like that. That may be better suited to the senders/receivers based...

[FEA] Multi-buffer copy algorithm

This feature request has been addressed by PR https://github.com/NVIDIA/cub/pull/359 that is now merged.

RuntimeError: radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument

We strongly assume that the root cause of this issue is related to issue https://github.com/NVIDIA/cub/issues/545. That issue has been fixed in PR https://github.com/NVIDIA/cub/pull/547. Has anyone had a chance to see...

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream

> This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run `ci/local/build.bash` from the `nvbench` root to build and test if you...

Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream

> @elstehle I'm still seeing a test regression when running `ci/local/build.bash` on this branch: > > ``` > 4/39 Test #32: nvbench.test.state_generator ..................***Failed 2.39 sec > /cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call...