blazingmq icon indicating copy to clipboard operation
blazingmq copied to clipboard

Fix possible misaligned read in crc32c algorithm

Open hallfox opened this issue 10 months ago • 0 comments

It is possible to have a misaligned pointer read for certain input buffers to the crc32c algorithm. The bug comes from the usage of crc32c8s in crc32cSse64bit.

crc32cSse64bit attempts to calculate the crc32c code by consuming the input buffer 1024 bytes at a time. It does so by calculating how much of the initial input it needs to read until the remaining input buffer is divisible by 1024.

    // Process ('length' mod 1024) bytes
    // Find the next alignment boundary
    unsigned int i = length % 1024;
    crc = crc32cUntilAligned(&data, &length, crc);
    if (i) {
        length -= i;  // length = k * 1024 for k in {0, 1, 2, ...}
        crc = crc32c8s(data, i, crc);
        data += i;
        i = 0;
    }

The problem is the assumption that consuming i bytes will keep you on an 8-byte aligned boundary. This is important, because crc32c1024SseInt requires that the input buffer passed to it is aligned on uint64_t*.

As an example, imagine the case where we pass a buffer of size 1025, initially aligned on a 8 bytes.

----------------------------------------------------------------
0x000 * 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 |
----------------------------------------------------------------
0x008 * 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 |
----------------------------------------------------------------
                    (126 more of these)
----------------------------------------------------------------
0x400 * 0x00 |   (end of buffer)                                                   |
----------------------------------------------------------------

Because we first try to process the first 1025 % 1024 = 1 bytes, we end up passing address 0x001 to crc32c1024SseInt to try and compute the final chunk of 1024.

We fix this by swapping the order in which we process the buffer, by first processing up to the next 8-byte aligned boundary and then process as many 1024-byte chunks as we can, then finish consuming the rest with crc32c8s.

hallfox avatar Apr 08 '25 20:04 hallfox