cccl
cccl copied to clipboard
[BUG]: thrust iterators are not compatible with absl container algorithms.
Is this a duplicate?
- [X] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Compile-time Error
Component
Not sure
Describe the bug
Container algorithms from ABSL fail to compile when used w/ thrust iterators. Note that it's completely host-side C++ code, not CUDA.
Note that it's not clear what's at fault here: absl, thrust, libcu++ or my expectations that it should all work.
For what it's worth, the code used to work with an ancient version of thrust (https://github.com/NVIDIA/thrust/tree/f5ea60fd3aa3828c0eb8991a54acdfbed6707bd7)
How to Reproduce
https://godbolt.org/z/a4v4hecsP
Expected behavior
absl algorithms should work with thrust iterators.
Reproduction link
https://godbolt.org/z/a4v4hecsP
Operating System
n/a
nvidia-smi output
n/a
NVCC version
It's a host-side compilation issue. The failure happens with both clang and gcc.
Further reduced reproducer (huge thanks to @zygoloid): https://godbolt.org/z/fKh9cWjnz.
It indicates that the root cause is an ambiguous function overload caused by both libc++ and libcudacxx providing their own begin()
.
This issue is similar to https://github.com/NVIDIA/cccl/issues/1678, just harder to diagnose because the actual failure is hidden by SFINAE.
Thanks @Artem-B! That sounded like a nasty issue to track down.
@miscco will look into this ASAP.
@Artem-B I believe this relates to us making thrust::tuple
an alias of cuda::std::tuple
Can you check whether it works with https://github.com/NVIDIA/cccl/pull/1643
Unfortunately https://github.com/NVIDIA/cccl/pull/1643 does not seem to help with either of the failures in #1679 and #1678
Compiler diags didn't change from the reproducers on godbolt.
Tested with #1643 at be5ddf730691564dc025f633413f002c50407685, and absl at 0941ce7fd95104ba5f4678c41802a3229db1d5eb
For what it's worth, a hacky workaround is to replace the libcudacxx' implementations of __swallow
, begin
and end
with using ::std::...
of their libc++ variants. It does depend on the implementation details of libc++, and is probably not a good fix in general, but may tide me over till a better fix is available.
@zygoloid suggests that using std:...
may be OK:
I don't think this would be depending on implementation details. These function templates are required to be present by the standard. As far as I can see, the intent is for begin(X) to work when cuda::std is an associated namespace of X, in the same way it works when std is an associated namespace of X. I think it's reasonable to implement that by making cuda::std::begin an alias to std::begin -- otherwise, you'll get an ambiguity any time both std and cuda::std are associated namespaces (which is what happened here).
That said, I have no idea how/whether that reasoning would apply to the GPU side of the compilation where the standard does not promise us anything. I.e. we may have a problem if the functions must be __host__ __device__
.
Oh I also need to look at __swallow
:(
@zygoloid suggests that
using std:...
may be OK:I don't think this would be depending on implementation details. These function templates are required to be present by the standard. As far as I can see, the intent is for begin(X) to work when cuda::std is an associated namespace of X, in the same way it works when std is an associated namespace of X. I think it's reasonable to implement that by making cuda::std::begin an alias to std::begin -- otherwise, you'll get an ambiguity any time both std and cuda::std are associated namespaces (which is what happened here).
I believe we cannot go with a simple alias, as that does not work on device. What I have done is SFINAE those functions away when they are ambiguous with namespace std.
That should be safe, because if somebody pulls in namespace std via ADL, then that type will never be host device and we can just use the function from namespace std.
There is the caveat that someone might be using cuda::std::begin
but that is easily solved by just switching to begin
would this solve the problem? https://godbolt.org/z/j3djoY1Ys
It would solve the problem for device_iterator<std::something>
It would not solve it for std::vector<cuda::std::pair<int, int>>
EDIT: it would solve the problem, but we would need to add those to every sequence