oneDPL icon indicating copy to clipboard operation
oneDPL copied to clipboard

Add transform_output_iterator

Open upsj opened this issue 2 years ago • 6 comments

oneDPL has a variety of useful fancy iterators and views (transform, counting, iota, zip, permutation ...), but is missing an important type of iterator that can be used to fuse consecutive kernel calls and avoid intermediate storage:

transform_iterator operates like memory -> transformation -> kernel -> memory, transform_output_iterator would operate like memory -> kernel -> transformation -> memory

A comparable iterator type would be Thrust's transform_output_iterator and transform_input_output_iterator wrappers.

upsj avatar Nov 03 '21 09:11 upsj

Thank you for the suggestion, @upsj. I'll look into it.

timmiesmith avatar Nov 04 '21 13:11 timmiesmith

Hello, @upsj! For a "kernel fusion" we would recommend to use range API, for that case, views::transform as a range pipeline argument for an output range... For details, see https://oneapi-src.github.io/oneDPL/parallel_api/range_based_api.html Is it possible to share your case as an example or a code snip? Probably, we can give you a piece of advice.

MikeDvorskiy avatar Nov 29 '21 13:11 MikeDvorskiy

views::transform is an input range that applies the transformation after reading from the wrapped range. I need an output range that applies the transformation before writing to the wrapped range. Those are kind of opposite.

For an example from the Thrust context, see the following copy_if statement: The copy predicate needs the index of the current element, so I embellish the input iterator with an enumerating iterator, which I need to drop before writing to the output. https://github.com/ginkgo-project/ginkgo/blob/00226d03360d0917d7d898e721273a4ab92ff4e8/common/cuda_hip/matrix/hybrid_kernels.hpp.inc#L195

upsj avatar Nov 29 '21 15:11 upsj

@upsj. while we still evaluating the idea for transform_output_iterator, for the particular example with dropping an element out of a tuple you can use the combination of zip_iterator and discard_iterator.

akukanov avatar Feb 14 '22 12:02 akukanov

Porting an existing CUDA (thrust code) to oneAPI (DPL) indeed requires this feature. Looking forward to seeing it.

lilohuang avatar Feb 21 '22 08:02 lilohuang

@upsj @lilohuang your feedback on #592 would be welcome.

timmiesmith avatar Aug 11 '22 19:08 timmiesmith

@timmiesmith @akukanov

Could you kindly provide me with an update regarding the availability of the transform_output_iterator in the latest version of oneDPL? I would greatly appreciate it if you could share any plans or roadmap for its implementation. Thank you very much for your assistance.

lilohuang avatar Apr 13 '23 07:04 lilohuang

@lilohuang functionality for a mapping of transform_output_iterator was provided in SYCLomatic in https://github.com/oneapi-src/SYCLomatic/pull/361 . The plan for oneDPL is to investigate the possibility to provide a more general implementation of transform_iterator that could handle input and output modes.

timmiesmith avatar Sep 07 '23 13:09 timmiesmith