Amin Sedaghat
Amin Sedaghat
@fbusato I fixed `thread_reduce_apply` so the reduction functor is forwarded and invoked via `cuda::std::invoke`, which keeps the `PreferredT` casting while still honoring const functors (this unblocks Thrust’s `key_flag_scan_op`). I re-ran...
Addressed all feedback: * Moved to cuda/__bit/ffs.h * Changed to cuda:: namespace * Updated header guards * Using _CCCL_DEVICE_API * Fixed signed cast with make_signed_t * Simplified implementation * Builtin...
* Builtins use _CCCL_HAS_BUILTIN inside prologue * Simplified to direct int/long long casts * Flattened ffs() implementation * Added 128-bit integer support with hi/lo split * Tests use assert in...
@fbusato * Removed unnecessary casts * Fixed unsigned shift (avoid UB) * Added static_assert and _CCCL_ASSUME * Simplified MSVC ternary Skipped: * Using declarations (bit_reverse.h doesn't use them) * Helper...
@fbusato @miscco @davebayer Is there any changes I could make to make this PR merge ready?
@fbusato TIA for your feedback
@fbusato Thank you for the feedback! ran into a bit of an issue with clang-format lint and had to remove some blank lines. Happy to to revert but lint check...
@fbusato had to make these [changes](https://github.com/NVIDIA/cccl/pull/6192/commits/2b54a3d1e90eef15a6af7fe03d22d7c2347ad5e3) to be able to build successfully locally. Thank you for your help
@fbusato For verification, I pulled nvidia/cuda:13.0.2-devel-ubuntu22.04 (NVCC 13.0.88) and built/ran libcudacxx/test/libcudacxx/cuda/bit/ffs.pass.cpp; it passes, so the new cuda::ffs path behaves as expected with the latest toolchain. What I've learned from the...
Quick update: I re-ran the full libcudacxx, Thrust, CUB, and cudax presets in the rapidsai/devcontainers:25.12-cpp-llvm20-cuda13.0 image and they’re all green now. I also kicked off the c-parallel preset, but it...