asynchronous-codec Significant Read Performance Improvements

I recently noticed asynchronous-codec was performing slower than anticipated in a project I'm working on, read operations in particular. To get a better sense of things, I compared its performance against tokio-codec, which gave me the throughput I had been expecting for my workload.

After some digging and experimentation, I've ended up with two changes that, when combined, appear to bring asynchronous-codec's read performance effectively on par with tokio-codec.

Here's a quick overview of the changes:

Configurable Read Buffer Capacity: I've introduced a new constructor, FramedRead::with_capacity, which allows users to specify the initial size of the internal read buffer. Previously, this was hardcoded to 8KiB. This is similar to what tokio-codec does and allows fine tuning for different workloads.
Zero-Copy Reads: The internal read mechanism has been optimized to avoid an unnecessary data copy. Data is now read directly from the underlying AsyncRead source into FramedRead's internal BytesMut buffer. This eliminates an intermediate allocation and copy for each read operation. However, it does require some unsafe Rust, and it relies on the de-facto contract that futures::io::AsyncRead implementations will only write to the provided buffer and not read from its potentially uninitialized parts.

Some benchmarks to illustrate the impact of these changes:

Performance Benchmarks

The benchmark involves reading a 3 GiB file from fast, local NVMe storage. The BytesCodec is used to frame the data.

Debug Build Benchmarks

Version	Buffer Size	Average Throughput (MiB/s)	Diff vs. Unmodified (%)	Diff vs. Tokio-codec (64KiB) (%)
`async-codec` (Unmodified)	8KiB	475.76	0.0%	-83.8%
`async-codec` (configurable buffer)	8KiB	500.92	5.3%	-82.9%
`async-codec` (configurable buffer)	64KiB	2297.17	382.8%	-21.7%
`async-codec` (configurable + zero copy)	8KiB	536.70	12.8%	-81.7%
`async-codec` (configurable + zero copy)	64KiB	2932.17	516.3%	0.0%
`tokio-codec`	8KiB	541.91	13.9%	-81.5%
`tokio-codec`	64KiB	2932.17	516.3%	0.0%

Release Build Benchmarks

Version	Buffer Size	Average Throughput (MiB/s)	Diff vs. Unmodified (%)	Diff vs. Tokio-codec (64KiB) (%)
`async-codec` (Unmodified)	8KiB	841.58	0.0%	-78.0%
`async-codec` (configurable buffer)	8KiB	871.21	3.5%	-77.2%
`async-codec` (configurable buffer)	64KiB	3266.56	288.1%	-14.5%
`async-codec` (configurable + zero copy)	8KiB	886.15	5.3%	-76.8%
`async-codec` (configurable + zero copy)	64KiB	3966.25	371.3%	3.8%
`tokio-codec`	8KiB	906.72	7.7%	-76.3%
`tokio-codec`	64KiB	3819.52	353.8%	0.0%

For my workload, reads are now around 500% faster than before. Overall, asychronous-codec's read performance is now on par with tokio-codec.

Jun 25 '25 14:06 rrauch

Thank you for the work @rrauch.

I no longer use asynchronous-codec myself, and thus don't actively maintain it.

Maybe you want to create a fork. Happy to link to the various alternatives and archive this project.

//CC @jxs since you are using asynchronous codec as well.

https://github.com/libp2p/rust-libp2p/blob/70082df7e6181722630eabc5de5373733aac9a21/Cargo.lock#L310-L321

Jun 30 '25 09:06 mxinden

Hi @mxinden thanks for the ping! Can we then move this repo to the libp2p org and you give publishing rights?

Jul 02 '25 07:07 jxs

@jxs done. Made you an owner and transferred to libp2p GitHub organization.

Jul 06 '25 17:07 mxinden

Hi, thanks for looking into this! Left a comment. Can you also share the benchmarks code? Cheers!

Sorry, it was just some throwaway code that I didn't keep.

Here is roughly what it did:

pub async fn benchmark(path: impl AsRef<Path>) -> anyhow::Result<()> {
    let path = path.as_ref().to_path_buf();
    for i in (1..=10) {
        println!("iteration {}", i);
        let mut file = tokio::fs::File::open(&path).await?;
        let file_size = file.metadata().await?.len();
        let buf_size = 64 * 1024 as usize;
        file.set_max_buf_size(buf_size);
        let start_time = SystemTime::now();
        let mut reader = FramedRead::with_capacity(file.compat(), BytesCodec, buf_size);
        let mut bytes_read = 0;
        while let Some(chunk) = reader.try_next().await? {
            bytes_read += chunk.len();
        }
        let duration = SystemTime::now().duration_since(start_time)?;
        println!("read {} bytes in {} ms", bytes_read, duration.as_millis());
        if !file_size == bytes_read as u64 {
            panic!("incorrect number of bytes read");
        }
        println!();
    }
    Ok(())
}

Jul 11 '25 15:07 rrauch

I made some additional changes to this PR:

the buffer allocation logic is now simpler and cleaner
I added buffer initialization to externally supplied buffers. This was an oversight in the prior version.
added the set_capacity method to allow changing the buffer size at any time
I've added the unsafe disable_buffer_initialization option I suggested above

You can now easily compare the performance difference between using uninitialized and initialized buffers. The outcome will depend highly on the workload, hardware, etc.

Jul 12 '25 10:07 rrauch