cccl
cccl copied to clipboard
Extract environment boilerplate code from within the device interfaces to a separate header
fixes #5606
Boilerplate code for extracting types information (stream, mr, tuning_t etc.) is too big and repetitive across the new device environment based interfaces we introduced. This PR extracts the code into a separate function and re-uses it in the existing environment based device APIs that we have (DeviceScan and DeviceReduce).
Some consideration about the design for the reviewers:
- Each device primitive has its own quirks regarding which
deterministm_tis supported. For exampleDeviceReduce::Reducecan support bothgpu_to_gpuandrun_to_rundeterminism, whileDeviceReduce::ArgMax/MinorDeviceScanonly supportrun_to_runat the moment. That means the determinism heuristics cannot be incorporated into the boilerplate code. Future environment-based APIs must individually evaluate each algorithm to determine and support the appropriate deterministic types. - The existing boilerplate code uses a lambda callable to pass the specific deterministic algorithm implementation by packing the arguments.
auto reduce_callable = [&](auto tuning, void* storage, size_t& bytes, auto... args) {
using tuning_t = decltype(tuning);
return reduce_impl<tuning_t>(storage, bytes, args...);
};
// Dispatch with environment - handles all boilerplate
return detail::dispatch_with_env(
env, determinism_t{}, reduce_callable, d_in, d_out, num_items, reduction_op, ::cuda::std::identity{}, init);
}
I need some feedback on whether this interface on the dispatch_with_env() looks sane.
😬 CI Workflow Results
🟥 Finished in 1h 00m: Pass: 25%/81 | Total: 1d 05h | Max: 59m 55s | Hits: 75%/14741
See results here.
😬 CI Workflow Results
🟥 Finished in 3h 00m: Pass: 28%/81 | Total: 2d 04h | Max: 2h 59m | Hits: 81%/22346
See results here.