Fix: Prevent out-of-bounds read in mi_ctz_generic32 and mi_clz_generic32
This patch ensures that both mi_ctz_generic32 and mi_clz_generic32 perform safe indexing into the de Bruijn lookup tables by masking the computed index with & 31.
On platforms where unsigned long is 64-bit, the result of the de Bruijn multiplication and shift could exceed the valid index range (0–31), leading to an out-of-bounds read.
This change applies a bitwise AND mask to the final index:
mi_ctz_generic32: debruijn[(((x & -(int32_t)x) * 0x077CB531U) >> 27) & 31]mi_clz_generic32: debruijn[((x * 0x07C4ACDDU) >> 27) & 31]
This matches the fix applied in python/cpython#134070 to its integrated mimalloc copy.
Fixes: python/cpython#134070
Out of curiosity -
Reading the original: (uint32_t)(x * (uint32_t)(0x07C4ACDDU)) >> 27
I'm thinking the intent was to cut off the upper 32 bits with the (uint32_t) cast; with that, the result after shift right by 27 should not have any bits set above bit 5, due to the cast having previously cut off the bits above.
So I wonder why the original code doesn't work, but an explicit ANDing with 31 does.
Some quirk I'm missing?
Thank you for the PR! I was also confused because it is already cast to uint32_t so the extra & 31 should not be needed. Then I remembered a recent commit ed31847 which fixed the original bug by adding the uint32_t cast. So, this new PR is no longer needed. I really appreciate the upstreaming of CPython issues though -- thank you so much! (I am currently also working on getting V3 ready for CPython with better integration and less code needed on the CPython side to integrate).