mimalloc icon indicating copy to clipboard operation
mimalloc copied to clipboard

Fix: Prevent out-of-bounds read in mi_ctz_generic32 and mi_clz_generic32

Open vedant713 opened this issue 7 months ago • 2 comments

This patch ensures that both mi_ctz_generic32 and mi_clz_generic32 perform safe indexing into the de Bruijn lookup tables by masking the computed index with & 31.

On platforms where unsigned long is 64-bit, the result of the de Bruijn multiplication and shift could exceed the valid index range (0–31), leading to an out-of-bounds read.

This change applies a bitwise AND mask to the final index:

  • mi_ctz_generic32: debruijn[(((x & -(int32_t)x) * 0x077CB531U) >> 27) & 31]
  • mi_clz_generic32: debruijn[((x * 0x07C4ACDDU) >> 27) & 31]

This matches the fix applied in python/cpython#134070 to its integrated mimalloc copy.

Fixes: python/cpython#134070

vedant713 avatar May 17 '25 19:05 vedant713

Out of curiosity - Reading the original: (uint32_t)(x * (uint32_t)(0x07C4ACDDU)) >> 27 I'm thinking the intent was to cut off the upper 32 bits with the (uint32_t) cast; with that, the result after shift right by 27 should not have any bits set above bit 5, due to the cast having previously cut off the bits above. So I wonder why the original code doesn't work, but an explicit ANDing with 31 does. Some quirk I'm missing?

res2k avatar May 17 '25 20:05 res2k

Thank you for the PR! I was also confused because it is already cast to uint32_t so the extra & 31 should not be needed. Then I remembered a recent commit ed31847 which fixed the original bug by adding the uint32_t cast. So, this new PR is no longer needed. I really appreciate the upstreaming of CPython issues though -- thank you so much! (I am currently also working on getting V3 ready for CPython with better integration and less code needed on the CPython side to integrate).

daanx avatar May 21 '25 18:05 daanx