Elias Stehle

Results 34 comments of Elias Stehle

> TL;DR: There is something extremely odd going on here that I don't understand and just making the kernel `static` does _not_ fix the issue. Thanks for the reproducer and...

Thanks for the feedback and the preliminary evaluation, @senior-zero 👍 Fundamentally, our ideas are quite similar. You do a three-way partition on all the problems. I proposed to have a...

I'm currently gathering results of a few more benchmarks that hopefully will help us make an informed decision about which of the scheduling mechanisms to pursue (preliminary three-way partition vs....

So I ran the first batch of benchmarks. I'll add more throughout the week. ## Methodology - Benchmarks ran on V100 - We allocate two large buffers on device memory:...

Sorry for the wait. I did another clean up pass over the code of this PR. > I've long wanted a [`cuda::memcpy`](https://github.com/NVIDIA/cccl/issues/944) that would handle runtime determined alignment as well...

> We don't expose CG in the CUB APIs, this would require some more discussion before we added anything like that. That may be better suited to the senders/receivers based...

This feature request has been addressed by PR https://github.com/NVIDIA/cub/pull/359 that is now merged.

We strongly assume that the root cause of this issue is related to issue https://github.com/NVIDIA/cub/issues/545. That issue has been fixed in PR https://github.com/NVIDIA/cub/pull/547. Has anyone had a chance to see...

> This LGTM, thanks for catching it! Some of the tests don't build after the changes, you can run `ci/local/build.bash` from the `nvbench` root to build and test if you...

> @elstehle I'm still seeing a test regression when running `ci/local/build.bash` on this branch: > > ``` > 4/39 Test #32: nvbench.test.state_generator ..................***Failed 2.39 sec > /cccl/nvbench/nvbench/detail/device_scope.cuh:37: Cuda API call...