Georgii Evtushenko

Results 54 issues of Georgii Evtushenko

There was reported an issue regarding the internal accumulator type in `cub::DeviceScan::InclusiveSum`. The issue consists in using input data type as accumulator type. Here's the reproducer: ```cuda #include #include int...

type: bug: functional
P1: should have

1. There is no difference in performance and compilation time for the reduce with simple operators. On complex operators (256 sqrt calls), the compilation time is up to 2.4 times...

I've tried to request proper addresses from MSI tech support, to no avail. ``` EC dump 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D...

This PR adds visualization to the comparison script. Here's an example for: ```cpp // ... .add_int64_power_of_two_axis("elements", nvbench::range(12, 28, 4)) .add_int64_axis("ratio", {0, 10}); ``` When script is executed, one can specify...

Recent results show that noise could be increased up to 50% due to X-Server running on the device. To warn users about the noisy environment, we could check if GPU...

P2: nice to have

The issue is related to the following [one](https://github.com/NVIDIA/cub/issues/545). It was recently addressed in [CUB](https://github.com/NVIDIA/cub/pull/547). Thrust has to inherit the namespace solution for the CUDA backend.

P0: must have

Before porting to CUB, Thrust implementation of merge sort didn't use to have `*copy` version. When introducing `Copy` overload, I followed the CUB generic scheme of selecting output iterator value...

type: bug: functional

This PR addresses the following [issue](https://github.com/NVIDIA/cccl/issues/902) by replacing `__launch_bounds__` usages with `CUB_DETAIL_LAUNCH_BOUNDS`. `CUB_DETAIL_LAUNCH_BOUNDS` leads to `__launch_bounds__` usage only when RDC is **not** specified. Builds without RDC are not affected by...

testing: gpuCI in progress
type: bug: compiler