ouch icon indicating copy to clipboard operation
ouch copied to clipboard

Reduce buffer copying when compressing and decompressing

Open marcospb19 opened this issue 3 years ago • 0 comments

We are using std::io::copy to connect the writer and readers when compressing.

https://github.com/ouch-org/ouch/blob/986a6f6ccf6df858db4d50502f392b9eff8a3027/src/commands/compress.rs#L72-L75

And also when decompressing. https://github.com/ouch-org/ouch/blob/986a6f6ccf6df858db4d50502f392b9eff8a3027/src/commands/decompress.rs#L104-L109

We are using BufWriter for compression and BufReader for decompression, here's our buffer capacity:

https://github.com/ouch-org/ouch/blob/262ca5d582496774a834270ea04036e01aa8dd55/src/main.rs#L23-L24

However, the implementation of io::copy allocates a buffer on stack in order to read from the reader, and write to the writer, the operation is already buffered, creating BufReader and BufWriter is making it a double buffer, which just adds overhead, an extra copy layer.

https://github.com/rust-lang/rust/blob/388538fc963e07a94e3fc3ac8948627fd2d28d29/library/std/src/sys_common/io.rs#L1-L3 https://github.com/rust-lang/rust/blob/b85f57d652a141b5c73f4f46b986a92b6992e9d9/library/std/src/io/copy.rs#L141


Note: I'm ignoring kernel_copy from io::copy because our readers and writers are compression format streams, not file descriptors, so Rust cannot optimize reading and writing without data crossing kernel userspace, the function generic_copy is used instead, which uses stack_buffer_copy internally.

Note 2: There is an specialized copy implementation for BufWriter, I think it allows for accessing the bufwriters internal buffer instead of writing to another buffer, so maybe we just have a problem with BufReader, however, we might as well just remove the BufWriter anyways.

Note 3: maybe doing benchmarks with .lz4 or .zst will give some difference because they are formats with higher throughput so overhead would be more noticeable than other formats.

marcospb19 avatar Jan 06 '23 02:01 marcospb19