cccl [libcu++] Implement `cuda::ffs`

Implement type-safe cuda::std::ffs function as replacement for __ffs intrinsic.

Returns 1-based index of first set bit (0 for no bits set)
Works on all platforms and integer types
Handles x == 0 correctly (returns 0)

Fixes #6108

Oct 13 '25 03:10 Aminsed

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Oct 13 '25 03:10 copy-pr-bot[bot]

Addressed all feedback:

Moved to cuda/__bit/ffs.h
Changed to cuda:: namespace
Updated header guards
Using _CCCL_DEVICE_API
Fixed signed cast with make_signed_t
Simplified implementation
Builtin definitions moved to ffs.h
Added comprehensive tests

Oct 13 '25 14:10 Aminsed

Builtins use _CCCL_HAS_BUILTIN inside prologue
Simplified to direct int/long long casts
Flattened ffs() implementation
Added 128-bit integer support with hi/lo split
Tests use assert in constexpr context
Added large value tests with _u128 literals

Oct 13 '25 15:10 Aminsed

@fbusato

Removed unnecessary casts
Fixed unsigned shift (avoid UB)
Added static_assert and _CCCL_ASSUME
Simplified MSVC ternary

Skipped:

Using declarations (bit_reverse.h doesn't use them)
Helper extraction (conflicts with @davebayer's flatten request)

Oct 13 '25 19:10 Aminsed

@fbusato @miscco @davebayer Is there any changes I could make to make this PR merge ready?

Oct 19 '25 22:10 Aminsed

@Aminsed sorry about the delay, this is just a matter of prioritization

Oct 20 '25 08:10 miscco

/ok to test 73a7a6a

Oct 20 '25 16:10 fbusato

@Aminsed please ping us when you make progress. It is very easy to miss updates on the PR from our side. Please also mark the comments as resolved after you address them. Thanks again for the contribution.

Oct 20 '25 16:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 38m: Pass: 40%/84 | Total: 18h 56m | Max: 1h 29m | Hits: 51%/31415

See results here.

Oct 20 '25 17:10 github-actions[bot]

@fbusato TIA for your feedback

Oct 21 '25 13:10 Aminsed

/ok to test 9eb8017

Oct 21 '25 15:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 42m: Pass: 34%/89 | Total: 20h 19m | Max: 1h 42m | Hits: 28%/22879

See results here.

Oct 21 '25 17:10 github-actions[bot]

@fbusato Thank you for the feedback! ran into a bit of an issue with clang-format lint and had to remove some blank lines. Happy to to revert but lint check might fail.

Oct 25 '25 15:10 Aminsed

/ok to test 8693abc

Oct 27 '25 16:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 35m: Pass: 46%/89 | Total: 14h 21m | Max: 1h 16m | Hits: 78%/30973

See results here.

Oct 27 '25 17:10 github-actions[bot]

@fbusato had to make these changes to be able to build successfully locally. Thank you for your help

Nov 02 '25 01:11 Aminsed

/ok to test 8924985

Nov 03 '25 20:11 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 05m: Pass: 44%/90 | Total: 11h 06m | Max: 35m 30s | Hits: 99%/30344

See results here.

Nov 03 '25 21:11 github-actions[bot]

@fbusato

For verification, I pulled nvidia/cuda:13.0.2-devel-ubuntu22.04 (NVCC 13.0.88) and built/ran libcudacxx/test/libcudacxx/cuda/bit/ffs.pass.cpp; it passes, so the new cuda::ffs path behaves as expected with the latest toolchain.

What I've learned from the CI run: If we re-run the full libcudacxx presets, CMake still aborts with “cuda_std_20 is not known to CUDA compiler” (NVCC 12.5 and even 13.0 don’t expose that compile feature yet).

Nov 08 '25 01:11 Aminsed

pre-commit.ci autofix

Nov 08 '25 12:11 jrhemstad

/ok to test fd5edab

Nov 08 '25 12:11 jrhemstad

😬 CI Workflow Results

🟥 Finished in 54m 35s: Pass: 44%/90 | Total: 10h 47m | Max: 30m 14s | Hits: 99%/22803

See results here.

Nov 08 '25 13:11 github-actions[bot]

Quick update: I re-ran the full libcudacxx, Thrust, CUB, and cudax presets in the rapidsai/devcontainers:25.12-cpp-llvm20-cuda13.0 image and they’re all green now. I also kicked off the c-parallel preset, but it still fails on existing issues (Thrust warnings treated as errors for unused parameters, plus lambda/index_sequence bugs in c/parallel/src/jit_templates). Those failures happen on current main as well, so I didn’t try to patch them here. Could you please rerun CI when you get a chance? @fbusato @jrhemstad Thanks

Nov 16 '25 02:11 Aminsed

/ok to test 9b5ddd2

Nov 17 '25 21:11 fbusato

😬 CI Workflow Results

🟥 Finished in 2h 37m: Pass: 87%/90 | Total: 1d 17h | Max: 1h 56m | Hits: 97%/195190

See results here.

Nov 18 '25 00:11 github-actions[bot]