Adding std::vector<T, usm_allocator>::iterator to is_passed_directly types
It seems that std::vector with usm_allocator with sycl::usm::alloc::shared data were being processed as if they were host_iterators when being passed to oneDPL with device policy and dpcpp backend.
This means that they were being used to initialize a sycl::buffer which includes an extra copy to intermediate host memory. This is unnecessary because iterators to such data should be passed directly, and directly usable within a kernel.
This PR adds a specialization for the is_passed_directly trait, which enables such iterators to be properly passed directly. This avoids the extra copy.
This was uncovered in investigation #1429, and will be tested by those tests, after a suggestion from @mmichel11 .
Our documentation only makes mention of the case where we use a sycl::usm_allocator with sycl::usm::alloc::shared, but should we also provide a specialization for sycl::usm_allocator with sycl::usm::alloc::host to be passed directly since SYCL host memory is device accessible?
Our documentation only makes mention of the case where we use a
sycl::usm_allocatorwithsycl::usm::alloc::shared, but should we also provide a specialization forsycl::usm_allocatorwithsycl::usm::alloc::hostto be passed directly since SYCL host memory is device accessible?
Its a good question, I will need to look into it some. @MikeDvorskiy @rarutyun @akukanov do you have an opinion on how we should handle std::vector input data allocated with a sycl::usm_allocator with sycl::usm::alloc::host memory?
Our documentation only makes mention of the case where we use a
sycl::usm_allocatorwithsycl::usm::alloc::shared, but should we also provide a specialization forsycl::usm_allocatorwithsycl::usm::alloc::hostto be passed directly since SYCL host memory is device accessible?Its a good question, I will need to look into it some. @MikeDvorskiy @rarutyun @akukanov do you have an opinion on how we should handle
std::vectorinput data allocated with asycl::usm_allocatorwithsycl::usm::alloc::hostmemory?
After some thought, I believe we should support sycl::usm::alloc::host allocators as well as passed directly (and I have added it). The alternative is them being treated as host iterators, and being used to initialize a sycl::buffer. Of course, it is the user's responsibility to use host usm memory responsibly.
I am interested in hearing other opinions on this as well.
When run through the input data sweep tests from #1429, it seems that both shared and host usm_allocator iterators encounter issues when used as the source iterator for a permutation_iterator. I don't yet understand why this causes issues, but should be understood before merging this PR.
When run through the input data sweep tests from #1429, it seems that both
sharedandhostusm_allocatoriterators encounter issues when used as the source iterator for apermutation_iterator. I don't yet understand why this causes issues, but should be understood before merging this PR.
There was a bug of miss-placed > which was causing these usm_allocator iterators to not be treated as "passed_directly". It is now fixed, and the tests from #1429 now pass, and I've confirmed that they are handled via the correct path through the data input processing code.
@MikeDvorskiy @mmichel11 Actually there is a fundamental problem with this, and it will not work. This breaks host_iterators because std::vector::iterator with a default allocator are indistinguishable from std::vector_iterator with a usm_allocator.
This is problematic though, because as it stands (without this PR), we have an extra copy to a host memory location, and also these iterators are unable to be used as the source iterator for permutation iterators (the same as host_iterators). Sadly, I see no apparent solution to this issue.
Update:
Actually, for icpx, the std::vector::iterator does distinguish between host_iterator and usm_allocator, except for the case of std::vector<bool>::iterator (which is always std::_Bit_iterator, regardless of the allocator it seems).
I've provided a solution which should hopefully allow support when it is possible to, without breaking host_iterators.
It first checks if we can distinguish between host_iterator and iterator from a vector with usm_allocator using a new struct __vector_impl_distinguishes_usm_allocator_from_default. If we can distinguish between these types for the value type in question, we do and mark passed directly if we can detect usm_allocators. If we cannot distinguish between the two, we do not change the passed directly status.
This should not "spoil" any host_iterators, and it should allow better support for usm_allocators where it is possible.
@MikeDvorskiy Please take a look and see if you can find problems with this approach.
Following up on this strategy of first checking if usm allocator is distinguishable vs the default allocator within the iterator type...
It is possible (though I believe improbable) for an implementation to have the same type of iterator for vector with usm allocator as it has for vector with some custom host allocator, while at the same time having different iterator types for usm allocator vs the default allocator.
In my view, it is quite likely that if the iterator types are distinguishable, they are by including the allocator in the definition of the iterator type (as is done for icpx outside of vector<bool>).
However, if we want to be most conservative, this is not a 100% conclusive check that the iterator points to data from a usm allocator .
@MikeDvorskiy After our offline discussion, I've updated the PR in 2 ways.
- Made it more readable by abstracting out definitions of std::vector::iterator of different types
- Not only do we check if
std::vector<T>::iteratoris distinguishable fromstd::vector<T, usm_shared_allocator>::iterator, but we now check that all three are different: a)std::vector<T>::iteratorb)std::vector<T, usm_shared_allocator>::iteratorc)std::vector<T, usm_host_allocator>::iterator
With (2) I believe it is pretty definitive proof that the iterator definition includes the allocator itself, and it should not be possible for a std::vector<T,custom_host_allocator>::iterator to match the type of either usm allocator iterator.
Ran this new version through the tests in #1429, and confirmed that both host iterators and usm allocator iterators are headed to the right processing.