Reduce buffer copying when compressing and decompressing
We are using std::io::copy to connect the writer and readers when compressing.
https://github.com/ouch-org/ouch/blob/986a6f6ccf6df858db4d50502f392b9eff8a3027/src/commands/compress.rs#L72-L75
And also when decompressing. https://github.com/ouch-org/ouch/blob/986a6f6ccf6df858db4d50502f392b9eff8a3027/src/commands/decompress.rs#L104-L109
We are using BufWriter for compression and BufReader for decompression, here's our buffer capacity:
https://github.com/ouch-org/ouch/blob/262ca5d582496774a834270ea04036e01aa8dd55/src/main.rs#L23-L24
However, the implementation of io::copy allocates a buffer on stack in order to read from the reader, and write to the writer, the operation is already buffered, creating BufReader and BufWriter is making it a double buffer, which just adds overhead, an extra copy layer.
https://github.com/rust-lang/rust/blob/388538fc963e07a94e3fc3ac8948627fd2d28d29/library/std/src/sys_common/io.rs#L1-L3 https://github.com/rust-lang/rust/blob/b85f57d652a141b5c73f4f46b986a92b6992e9d9/library/std/src/io/copy.rs#L141
Note: I'm ignoring kernel_copy from io::copy because our readers and writers are compression format streams, not file descriptors, so Rust cannot optimize reading and writing without data crossing kernel userspace, the function generic_copy is used instead, which uses stack_buffer_copy internally.
Note 2: There is an specialized copy implementation for BufWriter, I think it allows for accessing the bufwriters internal buffer instead of writing to another buffer, so maybe we just have a problem with BufReader, however, we might as well just remove the BufWriter anyways.
Note 3: maybe doing benchmarks with .lz4 or .zst will give some difference because they are formats with higher throughput so overhead would be more noticeable than other formats.