Dan Hoeflinger
Dan Hoeflinger
@akukanov > Why do we expect it to have "disappointing " performance? Basically, with an output host iterator we need to copy to the device prior to the kernel and...
> Probably, it makes sense to use dpl::for_each + zip(data1, data2, res) + functor with predicate? In that case we have just one access mode for all sequencies - read_write....
> BTW, my general thoughts. Looking at a scenario like this wider - actually we need a way(approach) to "transfer data by mask" between device and host. If we consider...
> > For host_iterator inputs, wouldn't that add an unnecessary copy-back for data1 and data2 which were previously read (copy-in) only? > > right.... In that case - "transform" +...
I know this is now a draft, but I just want to preserve the state of my review: I've checked that the removed macros are properly replaced with their new...
Alternatively, perhaps we can instead "process" an empty but defined `_PSTL_UDR_PRESENT` into some local version that we then use so that we can have this workaround only in one place.
We do not include `pstl_config.h` anywhere in our own code. I think it is here to easier sync with upstream, but someone with more knowledge of the sync process can...
> I changed this PR to create a new `_ONEDPL_DEFINED_AND_NOT_ZERO` macro and use that instead. I think it made it much simpler. I like it a lot better now after...
Note: This does not do anything to address time spent in JIT. Its quite possible that a lot of the time of the test is spent in JIT compilation, and...
> Should we think about ability to save previous behavior when the macros `TEST_LONG_RUN` is defined? Yes, perhaps with a we can do a min / max with the scale...