heapless Performance regression for `HistoryBuf(fer)::write()` with heapless 0.9.1

Description

I've been using heapless's HistoryBuffer (now renamed to HistoryBuf) as a sliding search window with much success. More precisely, I have an iterator over a substantial number of bytes and need to find a particular sequence, so I'm doing something akin to the following:

let mut search_window: HistoryBuf<u8, 11> = HistoryBuf::new();

for byte in lots_of_bytes {
    search_window.write(byte);

    if search_window.oldest_ordered().eq(b"some-needle") {
        // Needle found; do something useful
        return;
    }
}

While I'm not using heapless in its intended embedded use case, I found its HistoryBuffer to be much faster than using a VecDeque with a manual pop_front()/push_back() or similar alternatives.

However, with heapless 0.9.1, there seems to be a significant regression in the performance of the above snippet, which I'd largely attribute to the write() call. Of course, the oldest_ordered().eq() in every iteration has a proportionately larger impact on runtime than the write() on its own, but just upgrading heapless to 0.9.1 with this snippet still results in about 400 MB/s less throughput in a real-world scenario (~1.4 → ~1.0 GB/s).

Different optimization settings like LTO or codegen units do not seem to affect the result much. You can find a Divan-based benchmark as an MRE below. This may very well be a case of me holding it wrong, but I figured it would be worth reporting nonetheless.

Environment

CPU: Ryzen 7 PRO 5850U
Kernel Version: 6.16.1

rustc -vV

rustc 1.89.0 (29483883e 2025-08-04)
binary: rustc
commit-hash: 29483883eed69d5fb4db01964cdf2af4d86e9cb2
commit-date: 2025-08-04
host: x86_64-unknown-linux-gnu
release: 1.89.0
LLVM version: 20.1.7

MRE Benchmark

// benches/historybuf.rs

use std::hint::black_box;

use divan::counter::BytesCount;
use heapless::HistoryBuf;  // Replace with `HistoryBuffer` for heapless 0.8.0

const NEEDLE: &[u8] = b"needle451";
const WINDOW_SIZE: usize = NEEDLE.len();

fn historybuf_search_window_write(data: Vec<u8>) {
    let mut search_window: HistoryBuf<u8, WINDOW_SIZE> = HistoryBuf::new();

    for byte in data {
        search_window.write(byte);
    }
}

#[divan::bench]
fn bench_historybuf_search_window_write(bencher: divan::Bencher) {
    let total_bytes: usize = 8 * 1024 * 1024;
    let bytes_before_needle = total_bytes - NEEDLE.len();

    bencher
        .counter(BytesCount::new(total_bytes))
        .with_inputs(|| {
            let mut data = Vec::with_capacity(total_bytes);
            data.extend(vec![b'\0'; bytes_before_needle]);
            data.extend(NEEDLE);
            data
        })
        .bench_values(|data| {
            historybuf_search_window_write(black_box(data));
        });
}

fn main() {
    divan::main();
}

# Cargo.toml

[package]
name = "heapless-perf-mre"
edition = "2024"

[dependencies]
heapless = "0.9.1"

[dev-dependencies]
divan = "0.1.21"

[profile.release]
lto = "fat"
codegen-units = 1

[[bench]]
name = "historybuf"
harness = false

Results

cargo bench --benches

heapless 0.8.0 (`HistoryBuffer`)

     Running benches/historybuf.rs (target/release/deps/historybuf-2a85391490864b67)
Timer precision: 20 ns
historybuf                               fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ bench_historybuf_search_window_write  24.61 µs      │ 275.3 µs      │ 119.5 µs      │ 124.4 µs      │ 100     │ 100
                                         340.7 GB/s    │ 30.46 GB/s    │ 70.15 GB/s    │ 67.37 GB/s    │         │

heapless 0.9.1 (`HistoryBuf`)

     Running benches/historybuf.rs (target/release/deps/historybuf-974f73ca8d39caa1)
Timer precision: 20 ns
historybuf                               fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ bench_historybuf_search_window_write  6.823 ms      │ 7.078 ms      │ 6.87 ms       │ 6.892 ms      │ 100     │ 100
                                         1.229 GB/s    │ 1.185 GB/s    │ 1.22 GB/s     │ 1.217 GB/s    │         │

Aug 24 '25 12:08 BVollmerhaus

However, with heapless 0.9.1, there seems to be a significant regression

Wow, the regression indeed is a quite severe. 😢 If we had benchmarks in the repo and ran it as part of the CI (like I do in zbus project), it would help identifying such regressions early on. Do you think you could contribute that (I'm asking cause it seems you're already half way there since you've some benchmarks)?

As for the actual issue, I've never used this part of the API but I guess someone will need to do a git bisect with the benchmarking code you provided to find out which change introduced this and hopefully the person who was responsible, will be interested in fixing this regression too. :)

Aug 24 '25 18:08 zeenix

do a git bisect

Looks like this bisects to 314fd85. @sosthene-nitrokey Mind having a look? 🙂

I'd love to contribute (initial) benchmarks, but I'll have to see if I can make time for that. Quite busy at the moment. 😬

Aug 24 '25 18:08 BVollmerhaus

It looks like it's about optimising out the panicking branches in the slice access. Adding the following at the beginning of the write function brings back the performance to what it was before, because with 0.9.1 it appears that bound checks are not elided while they can be in 0.8.0.

        if self.write_at >= self.capacity() {
            unsafe { unreachable_unchecked() };
        }

Aug 25 '25 13:08 sgued

Performance regression for `HistoryBuf(fer)::write()` with heapless 0.9.1

Description

Environment

MRE Benchmark

Results

heapless 0.8.0 (HistoryBuffer)

heapless 0.9.1 (HistoryBuf)

heapless 0.8.0 (`HistoryBuffer`)

heapless 0.9.1 (`HistoryBuf`)