claxon icon indicating copy to clipboard operation
claxon copied to clipboard

Reading on uninitialized memory may cause UB

Open JOE1994 opened this issue 5 years ago • 6 comments

Hello :crab: , we (Rust group @sslab-gatech) found a memory-safety/soundness issue in this crate while scanning Rust code on crates.io for potential vulnerabilities.

Issue Description

  • metadata::read_vorbis_comment_block() method

https://github.com/ruuda/claxon/blob/2f053855d581cfe3b9e7cef67fed5b0b0ccc45aa/src/metadata.rs#L432-L438

https://github.com/ruuda/claxon/blob/2f053855d581cfe3b9e7cef67fed5b0b0ccc45aa/src/metadata.rs#L473-L475

  • metadata::read_application_block() method

https://github.com/ruuda/claxon/blob/2f053855d581cfe3b9e7cef67fed5b0b0ccc45aa/src/metadata.rs#L544-L546

Methods metadata::read_vorbis_comment_block() & metadata::read_application_block() create an uninitialized buffer and passes it to user-provided ReadBytes implementation. This is unsound, because it allows safe Rust code to exhibit an undefined behavior (read from uninitialized memory).

This part from the Read trait documentation explains the issue:

It is your responsibility to make sure that buf is initialized before calling read. Calling read with an uninitialized buf (of the kind one obtains via MaybeUninit<T>) is not safe, and can lead to undefined behavior.

Suggested Fix

It is safe to zero-initialize the newly allocated u8 buffer before read_into(), in order to prevent user-provided Read from accessing old contents of the newly allocated heap memory.

Also, there are two nightly features for handling such cases.

  • https://doc.rust-lang.org/std/io/struct.Initializer.html
  • https://rust-lang.github.io/rfcs/2930-read-buf.html

Thank you for checking out this issue :+1:

JOE1994 avatar Jan 08 '21 00:01 JOE1994

#19 was opened a few months ago with a proposed a fix for the related parts of code, but it hasn't seen much activity since.

memoryruins avatar Jan 10 '21 23:01 memoryruins

Thank you for the detailed write-up. I understand that a pathological ReadBytes implementation could inspect the buffer, and that LLVM might do bad things when it does. However, both of the provided ReadBytes implementations only call copy_from_slice on the buffer. They do not pass the buffer directly to io::Read::read. As far as I can tell, the usage by those two provided implementations is sound, but the problem is that a user-provided implementation of ReadBytes might observe the uninitialized memory. (But please explain if I am misunderstanding something.)

My first attempt to prevent this was to replace the with_capacity/set_length pairs with vec![0; length]. I’ve prepared a patch in 199958f64b0f6ca72a845d30e772b03d284b31c4 that does this. To evaluate the performance impact of that I ran Musium in scanning mode. This code relies heavily on the metadata reading in Claxon, so it’s a good benchmark for this change. On my i7-6700HQ, the number of instructions executed goes up significantly by 4.8%, but there is no evidence that total running time is affected. This makes sense to me on a high-end superscalar CPU, but I fear that on other targets with simpler CPUs such as Raspberry Pi's (which I care about), the extra instructions do translate to extra running time. I haven’t measured that yet, but I would prefer a zero-overhead solution.

I think a zero-overhead solution is possible. All call sites that you pointed out have a length known ahead of time. They pass the uninitialized vector to read_into, but we could turn this around: make read_into allocate the vector. It would just be moving some lines around, so it shouldn’t affect the generated code too much, but it would prevent uninitialized buffers from being passed to user-provided ReadBytes implementations. I went ahead and implemented this in 18ae310925152500950ea6f245d87a2ad7fa51c8, but I haven’t benchmarked it yet, I hope to get around to that soon.

Do you think that 18ae310925152500950ea6f245d87a2ad7fa51c8 would be an adequate solution?

ruuda avatar Jan 11 '21 21:01 ruuda

@ruuda Thank you so much for your attention on this issue! :crab: The fix you made in 18ae310 looks sound & reasonable to me :+1: I apologize for keeping you waiting..

JOE1994 avatar Jan 19 '21 04:01 JOE1994

I wanted to go and benchmark this, but there is a complication: the branch I use in the application is a different one (that adds support for picture metadata), which mostly deletes ReadBytes in favor of just io::Read ... but io::Read can’t provide an implementation for read_vec that is both fast and sound, even though Cursor<[u8]> and BufReader can. I think it can be fixed with a new trait for read_vec, but it requires some larger changes.

ruuda avatar Feb 11 '21 21:02 ruuda

Could a 5% performance hit be worth it in exchange for safety in the short term, until the proper solution that requires refactoring actually lands?

The reference FLAC implementation already provides everything but memory safety; I feel memory safety is the main selling point of Claxon, and it's not something that should be compromised on.

Shnatsel avatar Jan 24 '22 02:01 Shnatsel

but io::Read can’t provide an implementation for read_vec that is both fast and sound

This can be implemented as follows:

pub fn read_vec<R>(r: &mut R, onto: &mut Vec<u8>, len: usize) -> Result<usize>
    where R: Read
{
    onto.reserve(len);
    r.take(len as u64).read_to_end(onto)
}

This is obviously sound (no unsafe). read_to_end uses read_buf, so this is fast if the underlying reader implements read_buf to read into uninitialized memory, which is the case for fs::File, io::Take, BufReader, etc.

zopsicle avatar Apr 23 '23 20:04 zopsicle