cuda-kat
cuda-kat copied to clipboard
Use iterators for at-grid-stride and at-block-stride traversals
Currently, we offer the at_grid_stride()
, at_block_stride()
and at_warp_stride()
functions, which take an invokable and ensure the appropriate traversal pattern is used.
Would it not be a good idea to offer, instead or in addition, iterators corresponding to these patterns, as in range.hpp
in the CUDA C++11 sample program?
The range in question is Mark Harris' adaptation cpp11range. I find the original to be a bit cluttered with stuff I don't need (e.g. infinite-loop range), and it "conflicts" with C++20 ranges, but it might be adaptable in a more pleasing fashion so that instead of, say:
auto f = [&] (Size pos) {
foo(pos);
};
kat::linear_grid::collaborative::grid::at_grid_stride(length, f_inner);
we could write:
for(Size pos : kat::ranges::at_grid_stride(length)) {
foo(pos);
}
as the latter is both shorter (even ignoring namespaces) and simpler, in not requiring a higher-order function.
Ok, I won't be going with Mark Harris' code. It's a bit too clunky IMHO; it's not C++17-friendly; and it is saddled with irrelevant baggage from the repository he had modified.
Instead, I'll implement my own integer and strided-integer ranges - which will be constexpr, and host+device; then on top of that I'll add named constructor idioms for warp, block and grid stride iteration.