tokio icon indicating copy to clipboard operation
tokio copied to clipboard

File operations don't use the whole buffer

Open lnicola opened this issue 5 years ago • 17 comments

tokio 0.2.4, see https://github.com/tokio-rs/tokio/blob/0d38936b35779b604770120da2e98560bbb6241f/tokio/src/io/blocking.rs#L29.

There's an unexpected 16 KB limit for IO operations, e.g. this will print 16384. It seems intended, but it's somewhat confusing.

async fn run() -> Result<(), std::io::Error> {
    let mut file = File::open("x.mkv").await?;
    let mut buf = [0u8; 32768];
    let size = file.read(&mut buf).await?;
    println!("{}", size);
    Ok(())
}

lnicola avatar Dec 17 '19 13:12 lnicola

Looking at your original issue, I think we can support better performance by working directly with Bytes. In that case, we can avoid the copying and instead send the bytes handle to the remote thread.

carllerche avatar Dec 21 '19 21:12 carllerche

I’d like to work on this. Some guidance would be appreciated. Especially around

I think we can support better performance by working directly with Bytes

blasrodri avatar Aug 03 '20 18:08 blasrodri

@carllerche By working withBytes do you mean call AsyncReadExt::read_buf instead of AsyncReadExt::read? Will this fill the Bytes buffer if it is larger that MAX_BUF?

grantperry avatar Jan 09 '21 00:01 grantperry

Not with the current implementation. It would probably require a special function for BytesMut specifically.

Darksonn avatar Jan 09 '21 10:01 Darksonn

@carllerche Why is 16K, and not others, is there some special reason, or testing proved a better performance with 16K?

fetchadd avatar Jun 28 '21 13:06 fetchadd

The reason to have a maximum buffer size is that the file API allocates an intermediate buffer separate from the user-provided buffer. I don't think the exact choice of size was benchmarked.

Darksonn avatar Jun 28 '21 13:06 Darksonn

Why not replacing Vec<u8> in Io::blocking::Buf for BytesMut?

blasrodri avatar Jul 04 '21 21:07 blasrodri

Well why would we? If both can be used, it's better to use a Vec<u8>.

Darksonn avatar Jul 04 '21 22:07 Darksonn

Well why would we? If both can be used, it's better to use a Vec<u8>.

You're right.

Any hints on how to move forward w/ this?

blasrodri avatar Jul 04 '21 23:07 blasrodri

The file operations that are offloaded to the spawn_blocking threadpool will probably continue to have a maximum buffer size. One thing that would be nice is to finish #3821, and operations executed through that setup would not be limited in size. That would involve finding a way to test various kernel versions in CI.

E.g. maybe there is a way to have some machines on AWS with the desired kernel version participate in running CI? Not sure on the details.

Darksonn avatar Jul 05 '21 07:07 Darksonn

It's an extremely narrow test, but in my quest to optimize I/O performance on something reading/writing whole (1-50GiB) files sequentially, I tested a quick hack that simply changes MAX_BUF to 128MiB and, much to my surprise, it Just Worked(TM): With a 128MiB buffer, I'm getting 128MiB per read() and write() syscall according to strace.

This is an obnoxiously large I/O size, but it does work on this very specific use case: The files are on cephfs, in an erasure coded pool, with 128MiB object size. Aligning I/O to read whole objects per request substantially improves performance (approx 80MiB/s per task with four tasks up to approx. 220MiB/s in my case)

This is on Fedora with kernel 5.6.13

EDIT: Fixed numbers

mcronce avatar Jan 22 '22 06:01 mcronce

I am open to changing the buffer size used by the File type.

Darksonn avatar Jan 22 '22 08:01 Darksonn

@Darksonn is there anything we could help with to get this issue addressed?

At Deno we got several reports regarding poor performance when working on large files (https://github.com/denoland/deno/issues/10157) due to a need to perform thousands of roundtrips between Rust and JavaScript to read the data.

bartlomieju avatar Mar 21 '22 13:03 bartlomieju

Well, the options are the following:

  1. We change the default buffer size.
  2. We provide a way to configure the buffer size.
  3. You perform the operations yourself with spawn_blocking instead of going through tokio::fs::File.

I think I would be ok with all of these. What do you think?

Darksonn avatar Mar 21 '22 14:03 Darksonn

@Darksonn I can make a PR with my changes later today. I landed on 32MiB as an optimal size for my use case, but I feel like making MAX_BUF arbitrarily large is probably fine, since it's just setting a hardcoded maximum - if calling code passes in a buffer with a smaller size, that size will be used

mcronce avatar Mar 21 '22 15:03 mcronce

@Darksonn at Deno, we will be going with option 3 for now, but ideally we would like to see option 2 happening.

crowlKats avatar Mar 22 '22 13:03 crowlKats

Pushed up #4580

mcronce avatar Mar 24 '22 06:03 mcronce

It just took me a good hour to find a bug while writing files using tokio::fs::File. Apparently, this issue also applies when writing to a file, which results in anything over 16384 bytes to just be ignored silently (without any panic or error). If the maximum buffer size is intended, I would suggest to at least document this behavior somewhere and maybe return an error or panic.

tp971 avatar Nov 16 '22 21:11 tp971

AsyncWrite::write is never guaranteed to always write out the entire buffer. You should always check the number of bytes written that is returned from the call. If you want to write the entire buffer, use write_all.

sfackler avatar Nov 16 '22 21:11 sfackler

#5397 increased the buf size to 2MB

https://github.com/tokio-rs/tokio/blob/0d382faa4e557d0f6e281485cfb14a4934461c41/tokio/src/io/blocking.rs#L29

is that enough to close this issue or some refinement should be done?

matheus-consoli avatar Jul 04 '23 17:07 matheus-consoli

I haven't tested it lately, but I suppose it's fine. @carllerche suggested using Bytes instead of a Vec, I don't know if it's still planned.

lnicola avatar Jul 04 '23 17:07 lnicola