Improve handling of permutation iterators with a base iterator composed of sycl iterators and passed directly types
As described in this TODO below, improve the handling of permutation_iterator with a base iterator which is not explicitly "passed directly", or a standalone sycl_iterator, but still is a "supported" base iterator type for permutation_iterator.
Specifically, define some trait __is_fancy_direct_sycl_v (of some similar name) which is true for types completely composed of sycl_iterator, "passed directly" types and fancy iterators (may be multiple levels of recursion with fancy iterators).
Handle the cases where __is_fancy_direct_sycl_v == true in a way which for sycl_iterator components, it properly creates an accessor to the entire sycl buffer (using its query-able size) to load into the kernel and provides a wrapped accessor which includes the offset and allows the kernel to access it without relying upon UB when we may overrun an accessor of size 1.
https://github.com/oneapi-src/oneDPL/blob/7cc0e4a36431d6a2c1b687dd6905bb0abbad87c0/include/oneapi/dpl/pstl/hetero/dpcpp/utils_ranges_sycl.h#L564-L568