oneDPL
oneDPL copied to clipboard
Add transform_output_iterator
oneDPL has a variety of useful fancy iterators and views (transform, counting, iota, zip, permutation ...), but is missing an important type of iterator that can be used to fuse consecutive kernel calls and avoid intermediate storage:
transform_iterator operates like memory -> transformation -> kernel -> memory
,
transform_output_iterator would operate like memory -> kernel -> transformation -> memory
A comparable iterator type would be Thrust's transform_output_iterator and transform_input_output_iterator wrappers.
Thank you for the suggestion, @upsj. I'll look into it.
Hello, @upsj!
For a "kernel fusion" we would recommend to use range API, for that case, views::transform
as a range pipeline argument for an output range... For details, see https://oneapi-src.github.io/oneDPL/parallel_api/range_based_api.html
Is it possible to share your case as an example or a code snip? Probably, we can give you a piece of advice.
views::transform is an input range that applies the transformation after reading from the wrapped range. I need an output range that applies the transformation before writing to the wrapped range. Those are kind of opposite.
For an example from the Thrust context, see the following copy_if statement: The copy predicate needs the index of the current element, so I embellish the input iterator with an enumerating iterator, which I need to drop before writing to the output. https://github.com/ginkgo-project/ginkgo/blob/00226d03360d0917d7d898e721273a4ab92ff4e8/common/cuda_hip/matrix/hybrid_kernels.hpp.inc#L195
@upsj. while we still evaluating the idea for transform_output_iterator
, for the particular example with dropping an element out of a tuple you can use the combination of zip_iterator
and discard_iterator
.
Porting an existing CUDA (thrust code) to oneAPI (DPL) indeed requires this feature. Looking forward to seeing it.
@upsj @lilohuang your feedback on #592 would be welcome.
@timmiesmith @akukanov
Could you kindly provide me with an update regarding the availability of the transform_output_iterator in the latest version of oneDPL? I would greatly appreciate it if you could share any plans or roadmap for its implementation. Thank you very much for your assistance.
@lilohuang functionality for a mapping of transform_output_iterator was provided in SYCLomatic in https://github.com/oneapi-src/SYCLomatic/pull/361 . The plan for oneDPL is to investigate the possibility to provide a more general implementation of transform_iterator that could handle input and output modes.