flate2-rs icon indicating copy to clipboard operation
flate2-rs copied to clipboard

GzDecoder.read_exact return Ok yet does nothing for vec

Open jtmoon79 opened this issue 2 years ago • 1 comments

For GzDecoder.read_exact, passing a v = Vec::<u8>::with_capacity(5) as a &mut [u8] (via call to as_mut()) will result in Ok(()) yet v will have no data (v.len() == 0).

reproduction

Create some gzipped test file.

echo 'THIS IS FILE test1 WITH SOME HIGH UNICODE ☺☺☺☺☺☺' | gzip -v > test1.gz

Create the project files:

File Cargo.toml

[package]
name = "test-GzDecoder-Issue"
version = "0.1.0"
edition = "2021"

[dependencies]
flate2 = "1.0.24"

File src/main.rs

use std::io;
use std::io::prelude::*;
use std::fs::File;

extern crate flate2;
use flate2::read::GzDecoder;

fn main() -> io::Result<()> {
    let file_gz = File::open("test1.gz").unwrap();
    let mut decoder = GzDecoder::new(file_gz);
    dbg!(&decoder);
    let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
    dbg!(&bytes5);

    let mut done = false;
    while !done {
        match decoder.read_exact(bytes5.as_mut()) {
               Ok(_) => {
                   std::io::stdout().write_all(&bytes5)?;
               },
               Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
                   std::io::stdout().write_all(&bytes5)?;
                   done = true;
               },
               Err(err) => {
                   eprintln!("Err: {:?}", err);
                   done = true;
               }
        }
        bytes5.fill(0);
    }
    std::io::stdout().write_all(b"\n")?;

    Ok(())
}

Run cargo run.

Result

The program is stuck in a loop calling read_exact which has no affect yet returns Ok(()).

Changing the program to use a stack-allocated array results in success (i.e. the file test1.gz is decompressed and written to the terminal).

use std::io;
use std::io::prelude::*;
use std::fs::File;

extern crate flate2;
use flate2::read::GzDecoder;

fn main() -> io::Result<()> {
    let file_gz = File::open("test1.gz").unwrap();
    let mut decoder = GzDecoder::new(file_gz);
    dbg!(&decoder);
    //let mut bytes5: Vec<u8> = Vec::<u8>::with_capacity(5);
    let mut bytes5: [u8; 5] = [0, 0, 0, 0, 0];
    dbg!(&bytes5);

    let mut done = false;
    while !done {
        //match decoder.read_exact(bytes5.as_mut()) {
        match decoder.read_exact(&mut bytes5) {
               Ok(_) => {
                   std::io::stdout().write_all(&bytes5)?;
               },
               Err(err) if err.kind() == std::io::ErrorKind::UnexpectedEof => {
                   std::io::stdout().write_all(&bytes5)?;
                   done = true;
               },
               Err(err) => {
                   eprintln!("Err: {:?}", err);
                   done = true;
               }
        }
        bytes5.fill(0);
    }
    std::io::stdout().write_all(b"\n")?;

    Ok(())
}

Expectation

In the case of no useful result using a Vec::<u8>, I would expect read_exact return an informative error.

I would prefer that GzDecoder.read_exact successfully fill an array passed from a vector, Vec::<u8>::as_mut().


Using rust 1.61.0-beta.3 (1ef1e0a 2022-03-31) and flate2 1.0.24 on Ubuntu.

jtmoon79 avatar Jun 10 '22 22:06 jtmoon79

After some tinkering, I'm somewhat sure this is because Vec::<u8>::with_capacity(5).as_mut().len() returns value 0, whereas [u8; 5].len() returns 5.

However, this leads to a tangential question:

For the last call to read_exact, when it returns ErrorKind::UnexpectedEof, how would the caller determine what bytes are of interest? Since the len() will be entire length of the passed &mut [u8] yet not all elements were written.

(I'm not sure if this is right place for this question).

jtmoon79 avatar Jun 11 '22 00:06 jtmoon79

Hi @jtmoon79. It's not a bug in flate2 but how Rust defines the Read trait.

You are correct that the problem is the length of the vector is 0. The read documentation says:

"It is your responsibility to make sure that buf is initialized before calling read.

The code you have allocates the memory for the vector but doesn't initialize it. You can initialize it to all zeroes by doing:

let mut bytes5 = vec![0; 5];

Then len will then return 5.

Also, the infinite loop you got is also expected by what the (shortened) documentation says:

If the return value of this method is Ok(n), then implementations must guarantee that 0 <= n <= buf.len(). A nonzero n value indicates that the buffer buf has been filled in with n bytes of data from this source. If n is 0, then it can indicate one of two scenarios:

  1. ...
  2. The buffer specified was 0 bytes in length.

The code successfully reads 0 bytes every time but doesn't make progress, which is why it gets stuck.

The answer to your second question about read_exact is also in the Rust documentation:

If this function encounters an “end of file” before completely filling the buffer, it returns an error of the kind ErrorKind::UnexpectedEof. The contents of buf are unspecified in this case.

If any other read error is encountered then this function immediately returns. The contents of buf are unspecified in this case.

The buffer is unspecified if there is an error and must not be used.

I recommend not using read_exact or read unless you have a specific use case. They are tricky to get right. You could potentially use read which returns how many bytes it has read, but I think the write_all might fail if the code reads a partial utf-8 character.

I would instead recommend using read_to_string to read the whole file into memory, if possible, or use lines to read the file line by line. They're much easier to use.

MichaelMcDonnell avatar Oct 06 '22 22:10 MichaelMcDonnell

Thanks for the detailed reply @MichaelMcDonnell ! 🙂

This was partially a newbie lesson on allocation vs. initialization, and RTFM, lol 😊

I recommend not using read_exact or read unless you have a specific use case.

Thanks for the suggestion. In my case, I decided on GzDecoder::read with a lot of extra checks. In my case, I want to read a precise amount of data and fill a buffer. Not pretty, but it worked best among the approaches I experimented with: https://github.com/jtmoon79/super-speedy-syslog-searcher/blob/0.0.33/src/readers/blockreader.rs#L1678-L1782

jtmoon79 avatar Oct 07 '22 19:10 jtmoon79

You're welcome! I/O is really hard to get right.

Could you please close the issue since it is not a bug in flate2?

MichaelMcDonnell avatar Oct 08 '22 18:10 MichaelMcDonnell

Closing. Per discussion, this is a rust design issue.

jtmoon79 avatar Oct 08 '22 22:10 jtmoon79