Elias Stehle

Results 23 issues of Elias Stehle

In PR https://github.com/NVIDIA/cccl/pull/1435, we consolidated `thrust::partition` to use `cub::DevicePartition`. Along with that change, we added support for large number of items. I.e., thrust will dispatch using offset types of `i64`...

Currently blocked by https://github.com/NVIDIA/cccl/issues/50 ```[tasklist] ### Tasks - [ ] https://github.com/NVIDIA/cccl/issues/1614 - [x] make the type of `num_items` a template parameter of `DevicePartition` algorithms - [x] determine approach to be...

## Description Experimental branch, exploring the option of a streaming `DeviceSelect` that processes partitions of sizes of up to `INT_MAX`, making sure iterators are only advanced/offset on the device, ##...

## Description Closes https://github.com/NVIDIA/cccl/issues/2062 Currently opened as a draft PR while waiting for resources to perform performance assessments. ### Remaining tasks: - [x] Benchmark changes _Note, `DeviceScan::*ByKey` is not yet...

```[tasklist] ### Tasks - [x] Add tests for large number of items in `DeviceScan` (completed by https://github.com/NVIDIA/cccl/pull/1830) - [x] Make the type of `num_items` a template parameter of `DeviceScan` algorithms...

https://github.com/NVIDIA/cccl/issues/1787 has the goal to benchmark for different offset types. However, it is unclear, whether instantiating the `Dispatch` class template with these offset types will actually work correctly and whether...

https://github.com/NVIDIA/cccl/issues/1787 has the goal to benchmark for different offset types. However, it is unclear, whether instantiating the `Dispatch` class template with these offset types will actually work correctly and whether...

In https://github.com/NVIDIA/cccl/issues/2055, we experimented with using bit-packed tile states in the decoupled look-back of algorithms that need to carry the offset type in the decoupled look-back. While the overall the...

## Description Closes https://github.com/NVIDIA/cccl/issues/2442 Note: this is a chained PR building on https://github.com/NVIDIA/cccl/pull/2400 Performance summary: | | Diff i32 vs i32.main any num items | Diff i32 vs i32.main 2^28...

## Description This PR implements streaming `DeviceSelect` and `DevicePartition` that, for very large inputs exceeding `INT_MAX` number of items, splits up the input into partitions of at most `INT_MAX` number...