orc
orc copied to clipboard
Murmer3 implementation has UB flagged by UBSan (though probably safe on desktop)
Describe the bug
UBSan correctly flags this line in the Murmer3 code:
/Users/runner/work/orc/orc/src/hash.cpp:40:69: runtime error: load of misaligned address 0x00010c28f87b for type 'const uint64_t' (aka 'const unsigned long long'), which requires 8 byte alignment
This is correct. ORC string pools aren't aligned to any memory address (and const char* doesn't need to be), but Murmer3 casts a void* to a uint64_t* and that causes the UBSan flag.
Note I can't imagine any machine capable of running ORC that would actually fail on an unaligned read, but the UBSan flag is correct.
Expected behavior
The correct fix is probably in Murmer3 - it takes in a void* without restriction and should deal with alignment. That's not a trivial change.
We could move our strings to 16 byte alignment, but that wastes a bunch of memory.
I added comments to an existing issue: https://github.com/aappleby/smhasher/issues/74
According to this random post, we may want to use memcpy to grab data 8 bytes at a time:
memcpyis the canonical way to bypass alignment and aliasing assumptions in C compilers. The compiler will turn it into whatever the most efficient sequence for unaligned memory access on the given architecture is.