cccl icon indicating copy to clipboard operation
cccl copied to clipboard

[libcu++] Implement `cuda::ffs`

Open Aminsed opened this issue 3 months ago • 22 comments

Implement type-safe cuda::std::ffs function as replacement for __ffs intrinsic.

  • Returns 1-based index of first set bit (0 for no bits set)
  • Works on all platforms and integer types
  • Handles x == 0 correctly (returns 0)

Fixes #6108

Aminsed avatar Oct 13 '25 03:10 Aminsed

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Oct 13 '25 03:10 copy-pr-bot[bot]

Addressed all feedback:

  • Moved to cuda/__bit/ffs.h
  • Changed to cuda:: namespace
  • Updated header guards
  • Using _CCCL_DEVICE_API
  • Fixed signed cast with make_signed_t
  • Simplified implementation
  • Builtin definitions moved to ffs.h
  • Added comprehensive tests

Aminsed avatar Oct 13 '25 14:10 Aminsed

  • Builtins use _CCCL_HAS_BUILTIN inside prologue
  • Simplified to direct int/long long casts
  • Flattened ffs() implementation
  • Added 128-bit integer support with hi/lo split
  • Tests use assert in constexpr context
  • Added large value tests with _u128 literals

Aminsed avatar Oct 13 '25 15:10 Aminsed

@fbusato

  • Removed unnecessary casts
  • Fixed unsigned shift (avoid UB)
  • Added static_assert and _CCCL_ASSUME
  • Simplified MSVC ternary

Skipped:

  • Using declarations (bit_reverse.h doesn't use them)
  • Helper extraction (conflicts with @davebayer's flatten request)

Aminsed avatar Oct 13 '25 19:10 Aminsed

@fbusato @miscco @davebayer Is there any changes I could make to make this PR merge ready?

Aminsed avatar Oct 19 '25 22:10 Aminsed

@Aminsed sorry about the delay, this is just a matter of prioritization

miscco avatar Oct 20 '25 08:10 miscco

/ok to test 73a7a6a

fbusato avatar Oct 20 '25 16:10 fbusato

@Aminsed please ping us when you make progress. It is very easy to miss updates on the PR from our side. Please also mark the comments as resolved after you address them. Thanks again for the contribution.

fbusato avatar Oct 20 '25 16:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 38m: Pass: 40%/84 | Total: 18h 56m | Max: 1h 29m | Hits: 51%/31415

See results here.

github-actions[bot] avatar Oct 20 '25 17:10 github-actions[bot]

@fbusato TIA for your feedback

Aminsed avatar Oct 21 '25 13:10 Aminsed

/ok to test 9eb8017

fbusato avatar Oct 21 '25 15:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 42m: Pass: 34%/89 | Total: 20h 19m | Max: 1h 42m | Hits: 28%/22879

See results here.

github-actions[bot] avatar Oct 21 '25 17:10 github-actions[bot]

@fbusato Thank you for the feedback! ran into a bit of an issue with clang-format lint and had to remove some blank lines. Happy to to revert but lint check might fail.

Aminsed avatar Oct 25 '25 15:10 Aminsed

/ok to test 8693abc

fbusato avatar Oct 27 '25 16:10 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 35m: Pass: 46%/89 | Total: 14h 21m | Max: 1h 16m | Hits: 78%/30973

See results here.

github-actions[bot] avatar Oct 27 '25 17:10 github-actions[bot]

@fbusato had to make these changes to be able to build successfully locally. Thank you for your help

Aminsed avatar Nov 02 '25 01:11 Aminsed

/ok to test 8924985

fbusato avatar Nov 03 '25 20:11 fbusato

😬 CI Workflow Results

🟥 Finished in 1h 05m: Pass: 44%/90 | Total: 11h 06m | Max: 35m 30s | Hits: 99%/30344

See results here.

github-actions[bot] avatar Nov 03 '25 21:11 github-actions[bot]

@fbusato

For verification, I pulled nvidia/cuda:13.0.2-devel-ubuntu22.04 (NVCC 13.0.88) and built/ran libcudacxx/test/libcudacxx/cuda/bit/ffs.pass.cpp; it passes, so the new cuda::ffs path behaves as expected with the latest toolchain.

What I've learned from the CI run: If we re-run the full libcudacxx presets, CMake still aborts with “cuda_std_20 is not known to CUDA compiler” (NVCC 12.5 and even 13.0 don’t expose that compile feature yet).

Aminsed avatar Nov 08 '25 01:11 Aminsed

pre-commit.ci autofix

jrhemstad avatar Nov 08 '25 12:11 jrhemstad

/ok to test fd5edab

jrhemstad avatar Nov 08 '25 12:11 jrhemstad

😬 CI Workflow Results

🟥 Finished in 54m 35s: Pass: 44%/90 | Total: 10h 47m | Max: 30m 14s | Hits: 99%/22803

See results here.

github-actions[bot] avatar Nov 08 '25 13:11 github-actions[bot]

Quick update: I re-ran the full libcudacxx, Thrust, CUB, and cudax presets in the rapidsai/devcontainers:25.12-cpp-llvm20-cuda13.0 image and they’re all green now. I also kicked off the c-parallel preset, but it still fails on existing issues (Thrust warnings treated as errors for unused parameters, plus lambda/index_sequence bugs in c/parallel/src/jit_templates). Those failures happen on current main as well, so I didn’t try to patch them here. Could you please rerun CI when you get a chance? @fbusato @jrhemstad Thanks

Aminsed avatar Nov 16 '25 02:11 Aminsed

/ok to test 9b5ddd2

fbusato avatar Nov 17 '25 21:11 fbusato

😬 CI Workflow Results

🟥 Finished in 2h 37m: Pass: 87%/90 | Total: 1d 17h | Max: 1h 56m | Hits: 97%/195190

See results here.

github-actions[bot] avatar Nov 18 '25 00:11 github-actions[bot]