flate2-rs
flate2-rs copied to clipboard
GzDecoder.read_exact return Ok yet does nothing for vec
For GzDecoder.read_exact
, passing a v = Vec::<u8>::with_capacity(5)
as a &mut [u8]
(via call to as_mut()
) will result in Ok(())
yet v
will have no data (v.len() == 0
).
reproduction
Create some gzipped test file.
echo 'THIS IS FILE test1 WITH SOME HIGH UNICODE ☺☺☺☺☺☺' | gzip -v > test1.gz
Create the project files:
File Cargo.toml
[package]
name = "test-GzDecoder-Issue"
version = "0.1.0"
edition = "2021"
[dependencies]
flate2 = "1.0.24"
File src/main.rs
use std::io;
use std::io::prelude::*;
use std::fs::File;
extern crate flate2;
use flate2::read::GzDecoder;
fn main() -> io::Result<()> {
let file_gz = File::open("test1.gz").unwrap();
let mut decoder = GzDecoder::new(file_gz);
dbg!(&decoder);
let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
dbg!(&bytes5);
let mut done = false;
while !done {
match decoder.read_exact(bytes5.as_mut()) {
Ok(_) => {
std::io::stdout().write_all(&bytes5)?;
},
Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
std::io::stdout().write_all(&bytes5)?;
done = true;
},
Err(err) => {
eprintln!("Err: {:?}", err);
done = true;
}
}
bytes5.fill(0);
}
std::io::stdout().write_all(b"\n")?;
Ok(())
}
Run cargo run
.
Result
The program is stuck in a loop calling read_exact
which has no affect yet returns Ok(())
.
Changing the program to use a stack-allocated array results in success (i.e. the file test1.gz
is decompressed and written to the terminal).
use std::io;
use std::io::prelude::*;
use std::fs::File;
extern crate flate2;
use flate2::read::GzDecoder;
fn main() -> io::Result<()> {
let file_gz = File::open("test1.gz").unwrap();
let mut decoder = GzDecoder::new(file_gz);
dbg!(&decoder);
//let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
let mut bytes5: [u8; 5] = [0, 0, 0, 0, 0];
dbg!(&bytes5);
let mut done = false;
while !done {
//match decoder.read_exact(bytes5.as_mut()) {
match decoder.read_exact(&mut bytes5) {
Ok(_) => {
std::io::stdout().write_all(&bytes5)?;
},
Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
std::io::stdout().write_all(&bytes5)?;
done = true;
},
Err(err) => {
eprintln!("Err: {:?}", err);
done = true;
}
}
bytes5.fill(0);
}
std::io::stdout().write_all(b"\n")?;
Ok(())
}
Expectation
In the case of no useful result using a Vec::<u8>
, I would expect read_exact
return an informative error.
I would prefer that GzDecoder.read_exact
successfully fill an array passed from a vector, Vec::<u8>::as_mut()
.
Using rust 1.61.0-beta.3 (1ef1e0a 2022-03-31)
and flate2 1.0.24
on Ubuntu.
After some tinkering, I'm somewhat sure this is because Vec::<u8>::with_capacity(5).as_mut().len()
returns value 0
, whereas [u8; 5].len()
returns 5
.
However, this leads to a tangential question:
For the last call to read_exact
, when it returns ErrorKind::UnexpectedEof
, how would the caller determine what bytes are of interest? Since the len()
will be entire length of the passed &mut [u8]
yet not all elements were written.
(I'm not sure if this is right place for this question).
Hi @jtmoon79. It's not a bug in flate2 but how Rust defines the Read
trait.
You are correct that the problem is the length of the vector is 0
. The read
documentation says:
"It is your responsibility to make sure that buf is initialized before calling read.
The code you have allocates the memory for the vector but doesn't initialize it. You can initialize it to all zeroes by doing:
let mut bytes5 = vec![0; 5];
Then len
will then return 5.
Also, the infinite loop you got is also expected by what the (shortened) documentation says:
If the return value of this method is Ok(n), then implementations must guarantee that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:
- ...
- The buffer specified was 0 bytes in length.
The code successfully reads 0 bytes every time but doesn't make progress, which is why it gets stuck.
The answer to your second question about read_exact
is also in the Rust documentation:
If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.
If any other read error is encountered then this function immediately returns. The contents of buf are unspecified in this case.
The buffer is unspecified if there is an error and must not be used.
I recommend not using read_exact
or read
unless you have a specific use case. They are tricky to get right. You could potentially use read
which returns how many bytes it has read, but I think the write_all
might fail if the code reads a partial utf-8 character.
I would instead recommend using read_to_string
to read the whole file into memory, if possible, or use lines
to read the file line by line. They're much easier to use.
Thanks for the detailed reply @MichaelMcDonnell ! 🙂
This was partially a newbie lesson on allocation vs. initialization, and RTFM, lol 😊
I recommend not using read_exact or read unless you have a specific use case.
Thanks for the suggestion. In my case, I decided on GzDecoder::read
with a lot of extra checks. In my case, I want to read a precise amount of data and fill a buffer. Not pretty, but it worked best among the approaches I experimented with:
https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.0.33/src/readers/blockreader.rs#L1678-L1782
You're welcome! I/O is really hard to get right.
Could you please close the issue since it is not a bug in flate2
?
Closing. Per discussion, this is a rust design issue.