Giannis Gonidelis

Results 38 comments of Giannis Gonidelis

@brycelelbach you might want to consider [adding](https://github.com/settings/ssh/new) a **Signing Key** so runners fire automatically on your push.

``` # Benchmark Results ## cub::FindIf ### [0] NVIDIA H200 | T | Elements | RelativeMismatchPosition | Samples | CPU Time | Noise | GPU Time | Noise | |-----|------------------|--------------------------|---------|------------|--------|------------|-------|...

@bernhardmgruber `cub::FindIf` because of the early exit is expected to perform better than `thrust::count_if` at least in the middle cases `RelativeMismatchPosition = 0.5` (thanx for the name suggestion). On the...

**Performance Results of thrust::count_if vs cub::DeviceFind::FindIf** (_runs with I8 input type to test the performance of the newly introduced vectorized loads_) ``` ## [0] NVIDIA H200 | T | Elements...

Some long awaiting performance results on A6000 and H200 (extending @gevtushenko's [work](https://github.com/NVIDIA/cccl/pull/1870#discussion_r1642007649) in #1870): **Search Operation** `cub::Device::FindIf`, `thrust::find_if` and `thrust::count_if` are used as a backend to search for an `int32`...

Many thanks to @elstehle for helping figure out this index!!!! https://github.com/NVIDIA/cccl/blob/0364cf344c757b19366ba9f5a09448c8f0905867/cub/cub/agent/agent_find.cuh#L213-L217

**Update: After refactoring the code by introducing Dispatch and Agent layers the benchmark results look the same on my A6000 local machine.** _docs to be added over the weekend_

fresh out of the over results