memmap-rs
memmap-rs copied to clipboard
what happens if we exceed isize::MAX on 32-bit platforms?
The ptr::offset function, which underlies safe slice indexing, says in its docs:
If any of the following conditions are violated, the result is Undefined Behavior...The computed offset, in bytes, cannot overflow an
isize...memory acquired directly from allocators or memory mapped files may be too large to handle with this function.
Since the map function is unsafe, it's arguably fine for it to expose possible UB in this way. But I think most people reading the docs won't have any idea that this is a requirement. ~Maybe it would be better for Deref to panic rather than to return a slice that's "unsoundly large"?~ (Edit: Probably just return an error if we try to mmap something larger than isize::MAX?)
@oconnor663 can you be more specific with your concern? Do you have a specific potentially buggy line of code in mind, or perhaps a reproducible example?
I'm worried about what happens in the case where someone on a 32-bit system memory maps a 3 GB file (that is, longer than std::isize::MAX but shorter than std::usize::MAX), and then tries to read one of the bytes near the end. Here's what I believe will happen:
memmapchecks that the file isn't too long forusize, and it'll assemble a&[u8]slice that's 3 GB long.- Next, the caller indexes into that slice to read a byte, somewhere near the end. Maybe just to print it or something.
- That indexing operation translates into
usize'sSliceIndex::get_unchecked()implementation, which in turn callsstd::ptr::add, with ausizeargument that's larger thanstd::isize::MAX.
According to the ptr::add / ptr::offset docs I linked above, that's all it takes to trigger undefined behavior. And those docs specifically note that memory mapping is one way to get your hands on a slice large enough to hit this.
That said, I have no idea what actually happens in this case. It could be that, even though the usize gets cast into an isize and becomes negative, the result of all the pointer arithmetic still ends up being what the caller thought they were asking for? Maybe everything appears to work? Even if that's the case, I don't think we want to rely on it remaining the case.
My instinct is that MmapOptions::get_len, which is currently making sure the length doesn't overflow usize, should instead be making sure the length doesn't overflow isize. But maybe someone who knows more about why these limitations were originally put in place could give better advice.
@oconnor663 thanks for the detail. I put together #74 to try and reproduce this scenario; we'll see whether it crashes and burns on the 32bit CI targets. If you don't mind, please take a look and check if that test accurately reflects the scenario you're concerned about.
Early indications are that test fails with ENOMEM on 32bit platforms, which seems like it might be the correct behavior.
Hmm, maybe we should test whether a 2^31 - 2 byte mmap (just under isize::MAX) succeeds? If not, it could be that we're blowing past some per-process limit on the system.
Some more discussion here: https://www.reddit.com/r/rust/comments/9ghwuv/hey_rustaceans_got_an_easy_question_ask_here/e6fx4h2/ Relevant article: https://trust-in-soft.com/objects-larger-than-ptrdiff_max-bytes/
I get the feeling that these are problems we don't want to touch with a 10 foot pole :)
Update: As of https://github.com/rust-lang/rust/pull/53784, slice::from_raw_parts will panic in debug mode, if the length in bytes exceeds isize::MAX.
Note that stdlib debug asserts don’t ever run in official compiler builds. Only for like the rust-lang test suite.
@oconnor663 The only places that use ptr::offset in the code use it either with an alignment, which never exceeds the length of a page (4069), or the offset provided to MmapOptions::offset(usize). I agree that if this offset usize overflows isize::MAX it will cause issues, but I'm not seeing any reason why the length exceeding isize::MAX would be problematic, since slice::from_raw_parts is used for creating the slice.
Woops, sorry I completely missed the link between ptr::offset and from_raw_parts (https://github.com/rust-lang/rust/pull/53784 makes it clear).