[libcu++] Implement `cuda::ffs`
Implement type-safe cuda::std::ffs function as replacement for __ffs intrinsic.
- Returns 1-based index of first set bit (0 for no bits set)
- Works on all platforms and integer types
- Handles
x == 0correctly (returns 0)
Fixes #6108
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Addressed all feedback:
- Moved to cuda/__bit/ffs.h
- Changed to cuda:: namespace
- Updated header guards
- Using _CCCL_DEVICE_API
- Fixed signed cast with make_signed_t
- Simplified implementation
- Builtin definitions moved to ffs.h
- Added comprehensive tests
- Builtins use _CCCL_HAS_BUILTIN inside prologue
- Simplified to direct int/long long casts
- Flattened ffs() implementation
- Added 128-bit integer support with hi/lo split
- Tests use assert in constexpr context
- Added large value tests with _u128 literals
@fbusato
- Removed unnecessary casts
- Fixed unsigned shift (avoid UB)
- Added static_assert and _CCCL_ASSUME
- Simplified MSVC ternary
Skipped:
- Using declarations (bit_reverse.h doesn't use them)
- Helper extraction (conflicts with @davebayer's flatten request)
@fbusato @miscco @davebayer Is there any changes I could make to make this PR merge ready?
@Aminsed sorry about the delay, this is just a matter of prioritization
/ok to test 73a7a6a
@Aminsed please ping us when you make progress. It is very easy to miss updates on the PR from our side. Please also mark the comments as resolved after you address them. Thanks again for the contribution.
😬 CI Workflow Results
🟥 Finished in 1h 38m: Pass: 40%/84 | Total: 18h 56m | Max: 1h 29m | Hits: 51%/31415
See results here.
@fbusato TIA for your feedback
/ok to test 9eb8017
😬 CI Workflow Results
🟥 Finished in 1h 42m: Pass: 34%/89 | Total: 20h 19m | Max: 1h 42m | Hits: 28%/22879
See results here.
@fbusato Thank you for the feedback! ran into a bit of an issue with clang-format lint and had to remove some blank lines. Happy to to revert but lint check might fail.
/ok to test 8693abc
😬 CI Workflow Results
🟥 Finished in 1h 35m: Pass: 46%/89 | Total: 14h 21m | Max: 1h 16m | Hits: 78%/30973
See results here.
@fbusato had to make these changes to be able to build successfully locally. Thank you for your help
/ok to test 8924985
😬 CI Workflow Results
🟥 Finished in 1h 05m: Pass: 44%/90 | Total: 11h 06m | Max: 35m 30s | Hits: 99%/30344
See results here.
@fbusato
For verification, I pulled nvidia/cuda:13.0.2-devel-ubuntu22.04 (NVCC 13.0.88) and built/ran libcudacxx/test/libcudacxx/cuda/bit/ffs.pass.cpp; it passes, so the new cuda::ffs path behaves as expected with the latest toolchain.
What I've learned from the CI run: If we re-run the full libcudacxx presets, CMake still aborts with “cuda_std_20 is not known to CUDA compiler” (NVCC 12.5 and even 13.0 don’t expose that compile feature yet).
pre-commit.ci autofix
/ok to test fd5edab
😬 CI Workflow Results
🟥 Finished in 54m 35s: Pass: 44%/90 | Total: 10h 47m | Max: 30m 14s | Hits: 99%/22803
See results here.
Quick update: I re-ran the full libcudacxx, Thrust, CUB, and cudax presets in the rapidsai/devcontainers:25.12-cpp-llvm20-cuda13.0 image and they’re all green now. I also kicked off the c-parallel preset, but it still fails on existing issues (Thrust warnings treated as errors for unused parameters, plus lambda/index_sequence bugs in c/parallel/src/jit_templates). Those failures happen on current main as well, so I didn’t try to patch them here. Could you please rerun CI when you get a chance? @fbusato @jrhemstad Thanks
/ok to test 9b5ddd2
😬 CI Workflow Results
🟥 Finished in 2h 37m: Pass: 87%/90 | Total: 1d 17h | Max: 1h 56m | Hits: 97%/195190
See results here.