asynchronous-codec icon indicating copy to clipboard operation
asynchronous-codec copied to clipboard

Significant Read Performance Improvements

Open rrauch opened this issue 6 months ago • 5 comments

I recently noticed asynchronous-codec was performing slower than anticipated in a project I'm working on, read operations in particular. To get a better sense of things, I compared its performance against tokio-codec, which gave me the throughput I had been expecting for my workload.

After some digging and experimentation, I've ended up with two changes that, when combined, appear to bring asynchronous-codec's read performance effectively on par with tokio-codec.

Here's a quick overview of the changes:

  1. Configurable Read Buffer Capacity: I've introduced a new constructor, FramedRead::with_capacity, which allows users to specify the initial size of the internal read buffer. Previously, this was hardcoded to 8KiB. This is similar to what tokio-codec does and allows fine tuning for different workloads.
  2. Zero-Copy Reads: The internal read mechanism has been optimized to avoid an unnecessary data copy. Data is now read directly from the underlying AsyncRead source into FramedRead's internal BytesMut buffer. This eliminates an intermediate allocation and copy for each read operation. However, it does require some unsafe Rust, and it relies on the de-facto contract that futures::io::AsyncRead implementations will only write to the provided buffer and not read from its potentially uninitialized parts.

Some benchmarks to illustrate the impact of these changes:

Performance Benchmarks

The benchmark involves reading a 3 GiB file from fast, local NVMe storage. The BytesCodec is used to frame the data.

Debug Build Benchmarks

Version Buffer Size Average Throughput (MiB/s) Diff vs. Unmodified (%) Diff vs. Tokio-codec (64KiB) (%)
async-codec (Unmodified) 8KiB 475.76 0.0% -83.8%
async-codec (configurable buffer) 8KiB 500.92 5.3% -82.9%
async-codec (configurable buffer) 64KiB 2297.17 382.8% -21.7%
async-codec (configurable + zero copy) 8KiB 536.70 12.8% -81.7%
async-codec (configurable + zero copy) 64KiB 2932.17 516.3% 0.0%
tokio-codec 8KiB 541.91 13.9% -81.5%
tokio-codec 64KiB 2932.17 516.3% 0.0%

Release Build Benchmarks

Version Buffer Size Average Throughput (MiB/s) Diff vs. Unmodified (%) Diff vs. Tokio-codec (64KiB) (%)
async-codec (Unmodified) 8KiB 841.58 0.0% -78.0%
async-codec (configurable buffer) 8KiB 871.21 3.5% -77.2%
async-codec (configurable buffer) 64KiB 3266.56 288.1% -14.5%
async-codec (configurable + zero copy) 8KiB 886.15 5.3% -76.8%
async-codec (configurable + zero copy) 64KiB 3966.25 371.3% 3.8%
tokio-codec 8KiB 906.72 7.7% -76.3%
tokio-codec 64KiB 3819.52 353.8% 0.0%

For my workload, reads are now around 500% faster than before. Overall, asychronous-codec's read performance is now on par with tokio-codec.

rrauch avatar Jun 25 '25 14:06 rrauch

Thank you for the work @rrauch.

I no longer use asynchronous-codec myself, and thus don't actively maintain it.

Maybe you want to create a fork. Happy to link to the various alternatives and archive this project.

//CC @jxs since you are using asynchronous codec as well.

https://github.com/libp2p/rust-libp2p/blob/70082df7e6181722630eabc5de5373733aac9a21/Cargo.lock#L310-L321

mxinden avatar Jun 30 '25 09:06 mxinden

Hi @mxinden thanks for the ping! Can we then move this repo to the libp2p org and you give publishing rights?

jxs avatar Jul 02 '25 07:07 jxs

@jxs done. Made you an owner and transferred to libp2p GitHub organization.

mxinden avatar Jul 06 '25 17:07 mxinden

Hi, thanks for looking into this! Left a comment. Can you also share the benchmarks code? Cheers!

Sorry, it was just some throwaway code that I didn't keep.

Here is roughly what it did:

pub async fn benchmark(path: impl AsRef<Path>) -> anyhow::Result<()> {
    let path = path.as_ref().to_path_buf();
    for i in (1..=10) {
        println!("iteration {}", i);
        let mut file = tokio::fs::File::open(&path).await?;
        let file_size = file.metadata().await?.len();
        let buf_size = 64 * 1024 as usize;
        file.set_max_buf_size(buf_size);
        let start_time = SystemTime::now();
        let mut reader = FramedRead::with_capacity(file.compat(), BytesCodec, buf_size);
        let mut bytes_read = 0;
        while let Some(chunk) = reader.try_next().await? {
            bytes_read += chunk.len();
        }
        let duration = SystemTime::now().duration_since(start_time)?;
        println!("read {} bytes in {} ms", bytes_read, duration.as_millis());
        if !file_size == bytes_read as u64 {
            panic!("incorrect number of bytes read");
        }
        println!();
    }
    Ok(())
}

rrauch avatar Jul 11 '25 15:07 rrauch

I made some additional changes to this PR:

  • the buffer allocation logic is now simpler and cleaner
  • I added buffer initialization to externally supplied buffers. This was an oversight in the prior version.
  • added the set_capacity method to allow changing the buffer size at any time
  • I've added the unsafe disable_buffer_initialization option I suggested above

You can now easily compare the performance difference between using uninitialized and initialized buffers. The outcome will depend highly on the workload, hardware, etc.

rrauch avatar Jul 12 '25 10:07 rrauch