zfp_stream_maximum_size can overflow on wasm32
I've compiled ZFP to wasm32 and experienced out-of-bounds writes. In my particular case, I'm compressing a 9900 x 16987 2D array of single precision floats with an absolute error tolerance of 0.0402.
Some back of the napkin math based on https://github.com/LLNL/zfp/blob/7d1e3a21047a976599b562b3bbd53b1f34348f1a/src/zfp.c#L740-L770
brings me to:
dims = 2
bx = 2475 by = 4247 blocks = 10511325
values = 16
maxbits = 0 maxbits += 9; maxbits += 16 - 1 + 16 * 32; maxbits = 536
148 + 10511325 * 536 + stream_word_bits - 1 ^^^^^^^^^^^^^^ Here is where the multiplication overflows the size_t (32bit on wasm32), even though the division by CHAR_BIT would bring the result back into a valid 32bit integer.
Since the correct zfp_stream_maximum_size result would fit into a 32bit integer, i.e. the request is not excessive, I think the zfp_stream_maximum_size function should correctly handle the overflow here.
CC @milankl and @treigerm
This looks like a valid complaint. Does the problem go away if you use (uint64)blocks and cast the return value to size_t?
Per the zfp installation instructions, support for 64-bit integers is a requirement, though it should be possible to use 32-bit size_t. We've been careful about this in a number of places (particularly in index.hpp), but it looks like we were a bit careless here.
I suppose this function should return 0 if the size exceeds SIZE_MAX.
This looks like a valid complaint. Does the problem go away if you use
(uint64)blocksand cast the return value tosize_t?
That should work. I agree that checking at the end before casting back to size_t would be good.
Though I wonder if it would be possible to avoid the extended precision multiplication at all.
Though I wonder if it would be possible to avoid the extended precision multiplication at all.
zfp uses 64-bit integer arithmetic all over the place, especially in the innermost loop of (de)compression. Trying to avoid it here does not seem a worthwhile optimization.
@lindstro We're still experiencing the underlying issue despite this fix ... while I'll try to investigate more, maybe you have a hunch. For the 9900 x 16987 2D array of single precision floats, compression works with a fixed accuracy tolerance of 0.402 and 4.0200000000000005, but fails with an out-of-bounds memory access for a tolerance of 0.0402. Is there some computation during encoding that might overflow on 32bit for a very low tolerance?
To test, you can just create an e.g. standard normal array to reproduce.
The backtrack we get inside WASM looks like this (frames 1-9) are inside zfp, the others in my Rust wrapper:
Caused by: 0: error while executing at wasm backtrace: 0: 0x111518 - numcodecs_zfp_classic_wasm.wasm!stream_write_word 1: 0x10fec3 - numcodecs_zfp_classic_wasm.wasm!stream_write_bits 2: 0x1132ba - numcodecs_zfp_classic_wasm.wasm!encode_few_ints_prec_uint32 3: 0x1119c7 - numcodecs_zfp_classic_wasm.wasm!encode_ints_uint32 4: 0x110966 - numcodecs_zfp_classic_wasm.wasm!encode_block_int32_2 5: 0x10f92e - numcodecs_zfp_classic_wasm.wasm!encode_block_float_2 6: 0x10f27d - numcodecs_zfp_classic_wasm.wasm!zfp_encode_block_float_2 7: 0x110bd1 - numcodecs_zfp_classic_wasm.wasm!zfp_encode_block_strided_float_2 8: 0xe060d - numcodecs_zfp_classic_wasm.wasm!compress_strided_float_2 9: 0xdf184 - numcodecs_zfp_classic_wasm.wasm!zfp_compress 10: 0x701ca - numcodecs_zfp_classic_wasm.wasm!_ZN21numcodecs_zfp_classic3ffi52ZfpCompressionStreamWithBitstreamWithHeader$LT$T$GT$8compress17h1fae2edb3e2161f5E 11: 0x78776 - numcodecs_zfp_classic_wasm.wasm!_ZN21numcodecs_zfp_classic8compress17hfd7f2701afaaf4fdE 12: 0x73cdd - numcodecs_zfp_classic_wasm.wasm!ZN82$LT$numcodecs_zfp_classic..ZfpClassicCodec$u20$as$u20$numcodecs..codec..Codec$GT$6encode17hbafd6d7c379e04f1E 13: 0x65774 - numcodecs_zfp_classic_wasm.wasm!ZN89$LT$numcodecs_wasm_logging..LoggingCodec$LT$T$GT$$u20$as$u20$numcodecs..codec..Codec$GT$6encode17h1326089463963481E 14: 0x65154 - numcodecs_zfp_classic_wasm.wasm!ZN20numcodecs_wasm_guest106$LT$impl$u20$numcodecs_wasm_guest..bindings..exports..numcodecs..abc..codec..GuestCodec$u20$for$u20$T$GT$6encode17h31591561bc4ae71dE 15: 0x29a92 - numcodecs_zfp_classic_wasm.wasm!_ZN20numcodecs_wasm_guest8bindings7exports9numcodecs3abc5codec32_export_method_codec_encode_cabi17hbd0b7e1b3bb92182E 16: 0x2801 - numcodecs_zfp_classic_wasm.wasm!numcodecs:abc/[email protected]#[method]codec.encode 1: memory fault at wasm address 0x32270000 in linear memory of size 0x32270000 2: wasm trap: out of bounds memory access) }
I don't see anything obvious in zfp that might cause such overrun. Just to be sure, you're calling zfp_stream_set_accuracy() with this "small" tolerance before calling zfp_stream_maximum_size() and then allocating a buffer of that size? For a tolerance of 0.0402, the buffer size ought to be 704258800 bytes (or very close; it depends on things like word size).
Any chance you can inspect the members of the bitstream struct in stream_write_word? Unfortunately I don't have access to an ILP32 system, so I cannot debug this myself.
Another question: Do you observe the same issue when using the zfp CLI, i.e., without going through the Rust API? For example: zfp -f -2 9900 16987 -a 0.0402 -i /dev/zero. And what are the first four lines of output from testzfp?
I opened https://github.com/juntyr/zfp-wasm-bug where I tested zfp on x86. It took me a lot of iterations but in the end, here are my results:
The bug was still in zfp_stream_maximum_size. It seems that getting the correct result requires casting (every?) operand in:
maxsize = ((((bitstream_size)ZFP_HEADER_MAX_BITS) + ((bitstream_size)blocks) * ((bitstream_size)maxbits) + ((bitstream_size)stream_word_bits) - 1) & ~(((bitstream_size)stream_word_bits) - 1)) / ((bitstream_size)CHAR_BIT);
Maybe you have some insights as to why that is.
I also noticed that the encoded sizes are still different on 32bit vs 64bit:
- 55638440 (tol=4), 137950520 (tol=0.4), 200902904 (tol=0.04) on 64 bit
- 55634976 (tol=4), 137946536 (tol=0.4), 200897528 (tol=0.04) on 32 bit
You should not have to cast all operands as the integral promotion rules should do any necessary casting automatically. Or the compiler, not zfp, is broken.
Regarding different sizes, I was going to ask if you're using different settings of ZFP_BIT_STREAM_WORD_SIZE, but that would not explain differences this large. You're compressing a single field, right? Can you please share the zfp command-line arguments and/or (C/C++) code that gave you these sizes? I assume they're the return value of zfp_compress(), not zfp_stream_maximum_size().
These results now came when compiling to x86 with gcc on alpine, so I doubt that it's a compiler issue. But I also wonder why every operand had to be promoted.
I also noticed that the encoded sizes are still different on 32bit vs 64bit:
That was my bad, because I'm using random test data the variance in the exact numbers makes sense
I reduced the widening to
maxsize = ((ZFP_HEADER_MAX_BITS + ((bitstream_size)blocks) * maxbits + stream_word_bits - 1) & ~(((bitstream_size)stream_word_bits) - 1)) / CHAR_BIT;
So we need an additional widening before creating the mask.
Ah, this indeed is necessary as the integral promotion does not extend the right-hand-side operand of & with one-bits, as needed to keep all the MSBs in the 64-bit left-hand-side operand. We have a C++ zfp::internal::round_up() utility function for this, and it might make sense to write one for C also so we don't have to play all these casting games.
I also need to update CHANGELOG and document that zfp_stream_maximum_size() might return zero, so let me handle this in a more general way over the next day or two.
Ah, this indeed is necessary as the integral promotion does not extend the right-hand-side operand of
&with one-bits, as needed to keep all the MSBs in the 64-bit left-hand-side operand. We have a C++zfp::internal::round_up()utility function for this, and it might make sense to write one for C also so we don't have to play all these casting games.I also need to update
CHANGELOGand document thatzfp_stream_maximum_size()might return zero, so let me handle this in a more general way over the next day or two.
That would be great, thank you! We're currently comparing a few compressors including ZFP and after fixing some 32bit bugs in others this is the only one that remains (that we know of)
I pushed some changes to develop that implement round_up via "C templates." This hopefully takes care of any remaining bugs.
I pushed some changes to
developthat implementround_upvia "C templates." This hopefully takes care of any remaining bugs.
I can confirm that the latest commit on develop passes my 32-bit test in https://github.com/juntyr/zfp-wasm-bug. Thanks for the bug fix! I'll now push this change through the Rust bindings chain :)
ZFP now works on wasm32 with large datasets - the fix is confirmed! Thank you again for your help!