Georgii Evtushenko issues

Results 54 issues of


                                            Georgii Evtushenko

Refactor DeviceScan implementation to allow InclusiveScan/Sum to take an initial value

There was reported an issue regarding the internal accumulator type in `cub::DeviceScan::InclusiveSum`. The issue consists in using input data type as accumulator type. Here's the reproducer: ```cuda #include #include int...

type: bug: functional

P1: should have

Remove pragma unroll from device radix sort, thread reduce, histogram, radix rank, select if and block exchange

1. There is no difference in performance and compilation time for the reduce with simple operators. On complex operators (256 sqrt calls), the compilation time is up to 2.4 times...

GE76 12UGS(MS-17K4)

I've tried to request proper addresses from MSI tech support, to no avail. ``` EC dump 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D...

Plot comparison results

This PR adds visualization to the comparison script. Here's an example for: ```cpp // ... .add_int64_power_of_two_axis("elements", nvbench::range(12, 28, 4)) .add_int64_axis("ratio", {0, 10}); ``` When script is executed, one can specify...

Detect if GPU is being used for graphical purposes

Recent results show that noise could be increased up to 50% due to X-Server running on the device. To warn users about the noisy environment, we could check if GPU...

P2: nice to have

Fix thrust linkage

The issue is related to the following [one](https://github.com/NVIDIA/cub/issues/545). It was recently addressed in [CUB](https://github.com/NVIDIA/cub/pull/547). Thrust has to inherit the namespace solution for the CUDA backend.

P0: must have

Testing NVIDIA/cub#570

Testing NVIDIA/cub#547

Merge sort key type selection

Before porting to CUB, Thrust implementation of merge sort didn't use to have `*copy` version. When introducing `Copy` overload, I followed the CUB generic scheme of selecting output iterator value...

type: bug: functional

Wrap launch bounds

This PR addresses the following [issue](https://github.com/NVIDIA/cccl/issues/902) by replacing `__launch_bounds__` usages with `CUB_DETAIL_LAUNCH_BOUNDS`. `CUB_DETAIL_LAUNCH_BOUNDS` leads to `__launch_bounds__` usage only when RDC is **not** specified. Builds without RDC are not affected by...

testing: gpuCI in progress

type: bug: compiler