`<bit>`: Could `has_single_bit()` be faster?
Seems like it should convert to popcount(x) == 1 if it's available on the target arch, but looking at the code, doesn't seem like compiler would do it:
_NODISCARD constexpr bool has_single_bit(const _Ty _Val) noexcept {
return _Val != 0 && (_Val & (_Val - 1)) == 0;
}
https://github.com/microsoft/STL/blob/f2a2933dd65d9e8d3fa698a97b6074f7ef00e1fd/stl/inc/bit#L87
I'm not sure.
For the current implementation I'd expect something as following (assuming x64 arch, as it is primary optimization target):
or reg1, reg1
jz label1
lea reg2, [reg1-1]
test reg1, reg2
label1:
And for popcount as following;
popcnt reg2, reg1
cmp reg2, 1
Which of these is better would highly depend on the context. I'd expect the current branchy function to win for predictable input, and the popcnt branchless function towin for unpredictable input.
In sight of this uncertainty, I would avoid making any changes and introducing unnecessary complexity.
Oh, it also seems CPU vendor specific. AMDs have cheaper popcnt, could be the winner most of the time there.
According to GCC codegen, popcnt is the winner if available, yes.
If not, the best answer is (v ^ (v-1)) > v-1.
https://godbolt.org/z/d61bxW4r1
Ideally, our compilers would do the right thing here, so we may need to report optimization bug(s) to C1XX and/or Clang.
According to GCC codegen, popcnt is the winner if available, yes.
Yes, seems like gcc/clang use popcount for cpus starting from nehalem: https://godbolt.org/z/s4MnvcfW7 Nehalem was released in 2008.
If not, the best answer is
(v ^ (v-1)) > v-1.https://godbolt.org/z/d61bxW4r1
this one seems to be better for msvc and others for generic cpu. Looks like gcc/clang use this one as the codegen matches std::has_single_bit for x64/arm cpus
I just edited that file in github.dev and made that PR directly from the browser. Crazy, full blown code editor in browser :)
@pps83 are you interested in popcnt PR? Otherwise I can do it.
@pps83 are you interested in
popcntPR? Otherwise I can do it.
I can do it, but I'll need some assistance to be able to do it quickly. I just looked into it and here's how it can be done.
bit includes __msvc_bit_utils.hpp. __msvc_bit_utils.hpp defines _HAS_POPCNT_INTRINSICS, but it gets undefined. I need to change that, to keep the define so that inside bit I could do something like that:
_EXPORT_STD template <_Standard_unsigned_integral _Ty>
_NODISCARD constexpr bool has_single_bit(const _Ty _Val) noexcept {
#if _HAS_POPCNT_INTRINSICS
if (_STD is_constant_evaluated()) {
return _Val != 0 && (_Val & (_Val - 1)) == 0;
}
return _Unchecked_popcount(_Val) == 1;
#else
return _Val != 0 && (_Val & (_Val - 1)) == 0;
#endif // _HAS_POPCNT_INTRINSICS
}
there is already _HAS_COUNTL_ZERO_INTRINSICS define that similarly gets defined inside __msvc_bit_utils.hpp and used by bit header.
Is that ok?
https://github.com/microsoft/STL/pull/5534
there is already
_HAS_COUNTL_ZERO_INTRINSICSdefine that similarly gets defined inside__msvc_bit_utils.hppand used bybitheader.Is that ok?
Yes, I think so.