STL
STL copied to clipboard
`<bit>`: Use `_CountTrailingZeros[64]` for ARM64
In VS 2022 17.7 Preview 3 (internal MSVC-PR-469248), our compiler back-end dev Jack Buchanan implemented new intrinsics in <intrin0.inl.h>:
__MACHINEARM_ARM64(unsigned int _CountTrailingZeros(unsigned long))
__MACHINEARM_ARM64(unsigned int _CountTrailingZeros64(unsigned __int64))
We should take advantage of them in <bit>'s countr_zero(), actually implemented in <limits> by _Countr_zero():
https://github.com/microsoft/STL/blob/5404ba9c25f26f25a0ac50e6c4defce7833a8da6/stl/inc/limits#L1208-L1219
Similar to how we use _CountLeadingZeros[64] for countl_zero():
https://github.com/microsoft/STL/blob/5404ba9c25f26f25a0ac50e6c4defce7833a8da6/stl/inc/bit#L285-L306
Not sure if this will be in 17.7 Preview 3 or 17.8 Preview 1 - we'll need to check.
Looks like we might need to ask Clang to implement these intrinsics, similar to #1586.
Updating this issue to no longer mention ARM32; at this time we still need to keep it compiling and working, but we no longer care about optimizing for it. Only ARM64 performance matters.