[BUG]: Either add a public facing header for `<algorithm>` or move `<algorithm>` implementations into a detail or experimental namespace
Today:
- Many
<algorithm>implementations exist incuda::std::, which is a public facing namespace. - They can be included with
<cuda/std/__algorithm_>or<cuda/std/__algorithm/${ALGO}.h(ALGO=fill,ALGO=equal), which are not public facing headers. -
<cuda/std/algorithm>, a public facing header, does not exist.- -
<cuda/algorithm>is a public facing header and exposescuda::fill_bytes,cuda::copy_bytes, etc, but does not include any of the<algorithm>implementations incuda::std::. - Some
cuda::std::<algorithm>facilities are used in other public facing headers. This is weird and bad. For example, if you include<cuda/std/array>, you getcuda::std::equal.
We cannot have it both ways. Pick one of the following two options:
- If the
cuda::std::<algorithm>implementations that exist today are not ready for public exposure, then put them in a detail or experimental namespace. - If the
cuda::std::<algorithm>implementations are ready for public exposure, then provide public headers for them, e.g.<cuda/std/algorithm>. Remember, we have had partial header implementations in<cuda/std/*>in the past.
FYI, we have the last piece here: #3741. The holdup is the recursive implementation of the sorting algorithms, which leads to terrible code on a GPU. We have ideas on how to fix it. Thrust has a serial radix sort implementation. We could also just do a simple insertion sort. We just never found the time to push it over the finish line. @miscco is working on it.
I would disagree that we must publicly provide a partially implemented feature just because we use parts of it somewhere.
I agree that we should have <cuda/std/algorithm>, it is as always just a question of time constraints.
The sorting algorithms the C++ standard libraries provide and that we inherit from libc++ are relatively hostile to GPU execution.
That means we would have to rewrite one of the most complex and performance sensitive algorithms out there. This is neither trivial nor worth it right now.
I am strongly considering just leaving all sort related algorithms unimplemented and keep that as an open issue.
I also really want to write something about or stable but not stable header architecture.
It is incredibly annoying to include 200k LoC for just cuda::std::max , we should be able to allow our users to include <cuda/std/__algorithm/meow.h>. Cursed be the extensionless headers
I have updated #3741
To not expose the sorting algorithms but publicize everything else