tokio
tokio copied to clipboard
File operations don't use the whole buffer
tokio 0.2.4
, see https://github.com/tokio-rs/tokio/blob/0d38936b35779b604770120da2e98560bbb6241f/tokio/src/io/blocking.rs#L29.
There's an unexpected 16 KB limit for IO operations, e.g. this will print 16384
. It seems intended, but it's somewhat confusing.
async fn run() -> Result<(), std::io::Error> {
let mut file = File::open("x.mkv").await?;
let mut buf = [0u8; 32768];
let size = file.read(&mut buf).await?;
println!("{}", size);
Ok(())
}
Looking at your original issue, I think we can support better performance by working directly with Bytes
. In that case, we can avoid the copying and instead send the bytes handle to the remote thread.
I’d like to work on this. Some guidance would be appreciated. Especially around
I think we can support better performance by working directly with Bytes
@carllerche
By working withBytes
do you mean call AsyncReadExt::read_buf
instead of AsyncReadExt::read
? Will this fill the Bytes
buffer if it is larger that MAX_BUF
?
Not with the current implementation. It would probably require a special function for BytesMut
specifically.
@carllerche Why is 16K, and not others, is there some special reason, or testing proved a better performance with 16K?
The reason to have a maximum buffer size is that the file API allocates an intermediate buffer separate from the user-provided buffer. I don't think the exact choice of size was benchmarked.
Why not replacing Vec<u8>
in Io::blocking::Buf
for BytesMut
?
Well why would we? If both can be used, it's better to use a Vec<u8>
.
Well why would we? If both can be used, it's better to use a
Vec<u8>
.
You're right.
Any hints on how to move forward w/ this?
The file operations that are offloaded to the spawn_blocking
threadpool will probably continue to have a maximum buffer size. One thing that would be nice is to finish #3821, and operations executed through that setup would not be limited in size. That would involve finding a way to test various kernel versions in CI.
E.g. maybe there is a way to have some machines on AWS with the desired kernel version participate in running CI? Not sure on the details.
It's an extremely narrow test, but in my quest to optimize I/O performance on something reading/writing whole (1-50GiB) files sequentially, I tested a quick hack that simply changes MAX_BUF
to 128MiB and, much to my surprise, it Just Worked(TM): With a 128MiB buffer, I'm getting 128MiB per read()
and write()
syscall according to strace.
This is an obnoxiously large I/O size, but it does work on this very specific use case: The files are on cephfs, in an erasure coded pool, with 128MiB object size. Aligning I/O to read whole objects per request substantially improves performance (approx 80MiB/s per task with four tasks up to approx. 220MiB/s in my case)
This is on Fedora with kernel 5.6.13
EDIT: Fixed numbers
I am open to changing the buffer size used by the File
type.
@Darksonn is there anything we could help with to get this issue addressed?
At Deno we got several reports regarding poor performance when working on large files (https://github.com/denoland/deno/issues/10157) due to a need to perform thousands of roundtrips between Rust and JavaScript to read the data.
Well, the options are the following:
- We change the default buffer size.
- We provide a way to configure the buffer size.
- You perform the operations yourself with
spawn_blocking
instead of going throughtokio::fs::File
.
I think I would be ok with all of these. What do you think?
@Darksonn I can make a PR with my changes later today. I landed on 32MiB as an optimal size for my use case, but I feel like making MAX_BUF
arbitrarily large is probably fine, since it's just setting a hardcoded maximum - if calling code passes in a buffer with a smaller size, that size will be used
@Darksonn at Deno, we will be going with option 3 for now, but ideally we would like to see option 2 happening.
Pushed up #4580
It just took me a good hour to find a bug while writing files using tokio::fs::File
. Apparently, this issue also applies when writing to a file, which results in anything over 16384 bytes to just be ignored silently (without any panic or error). If the maximum buffer size is intended, I would suggest to at least document this behavior somewhere and maybe return an error or panic.
AsyncWrite::write
is never guaranteed to always write out the entire buffer. You should always check the number of bytes written that is returned from the call. If you want to write the entire buffer, use write_all
.
#5397 increased the buf size to 2MB
https://github.com/tokio-rs/tokio/blob/0d382faa4e557d0f6e281485cfb14a4934461c41/tokio/src/io/blocking.rs#L29
is that enough to close this issue or some refinement should be done?
I haven't tested it lately, but I suppose it's fine. @carllerche suggested using Bytes
instead of a Vec
, I don't know if it's still planned.