thrust icon indicating copy to clipboard operation
thrust copied to clipboard

partial fix for iterator traits and tags

Open ericniebler opened this issue 2 years ago • 11 comments

The symptoms of this problem are described in NVIDIA/cccl#705. Really, there are two problems with the way Thrust's iterator tags are currently defined:

  1. The *_device_* iterator tags are erroneously convertible the *_host_* tags. That's because the *_host_* tags are simply type aliases for the standard iterator tags, which the *_device_* tags inherit from.

  2. Because of the way the device iterator tags are defined (with iterator_category_with_system_and_traversal) there is no relationship between the different device tags. We would expect (and much of the code does) that thrust::forward_device_iterator_tag is convertible to thrust::input_device_iterator_tag, but that is sadly not true.

The two traits, iterator_category_to_system and iterator_category_to_traversal, are first checking whether the category is convertible to a host iterator tag. That is trivially true even for the device tags because of (1); as a result, device tags appear to have the "host" system, which is incorrect.

This PR offers the simplest, least intrusive, and sadly partial fix to the problem reported in NVIDIA/cccl#705. It changes the two traits so that they first check for convertibility to the device tag. This gets the answer correctly in more cases than currently.

I think I know how to fix this properly, but not without potentially breaking some code, so I'm submitting this PR now as a stop-gap.

ericniebler avatar May 10 '22 23:05 ericniebler

run tests

ericniebler avatar May 10 '22 23:05 ericniebler

@allisonvacanti I see you tagged this with "2.0" milestone, but this is a change that is safe to make in the 1.X series, IMO.

ericniebler avatar May 11 '22 18:05 ericniebler

this is a change that is safe to make in the 1.X series, IMO.

It would be, but the RC is already out. We only restart the RC process to add fixes for regressions.

alliepiper avatar May 12 '22 14:05 alliepiper

run tests

ericniebler avatar May 13 '22 21:05 ericniebler

I suppose if an iterator had any_system_tag as its system tag, then this diff would change its reported system from host to device. :thinking:

@allisonvacanti, do any iterators use any_system_tag?

ericniebler avatar May 14 '22 00:05 ericniebler

do any iterators use any_system_tag?

Yes -- this tag is used by iterators that aren't backed by actual memory (e.g thrust::counting_iterator, thrust::constant_iterator, etc, as well as several in CUB).

I suppose if an iterator had any_system_tag as its system tag, then this diff would change its reported system from host to device. 🤔

This should be fine. The device system is typically faster than the host system, so this is the preferred behavior. If an algorithm takes an any_system iterator and also a host_system iterator (e.g. thrust::copy(any_system_it, any_system_it_end, host_system_it_begin)) and requires the algorithm to run on host, it should still work since we check the systems of all iterators while dispatching.

alliepiper avatar May 16 '22 19:05 alliepiper

@ericniebler @allisonvacanti

Is this ready to merge?

miscco avatar Feb 21 '23 09:02 miscco

Is this ready to merge?

I ... think so? I can rebase, retest and commit if all looks good.

ericniebler avatar Feb 21 '23 21:02 ericniebler

run tests

ericniebler avatar Feb 21 '23 21:02 ericniebler

Hm that one is strange:

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(70): error: qualified name is not allowed

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(72): error: expected a "("

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(72): error: expected a type specifier

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(72): error: expected a ")"

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(76): error: expected a "("

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(76): error: expected a type specifier

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(77): error: expected a ";"

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(78): error: expected a "("

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(78): error: expected a ";"

/usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/numeric_traits.h(88): error: class template "__gnu_cxx::__numeric_traits_integer<_Value>" has no member "__is_signed"

There is nothing in this PR that could have triggered this

miscco avatar Feb 22 '23 07:02 miscco

@miscco we have wait till https://github.com/NVIDIA/thrust/pull/1866 is merged.

gevtushenko avatar Feb 22 '23 08:02 gevtushenko