BLAKE3 icon indicating copy to clipboard operation
BLAKE3 copied to clipboard

Eliminate zero-initialization overhead with internal MaybeUninit support

Open biryukovmaxim opened this issue 3 weeks ago • 3 comments

Summary

This PR adds internal support for MaybeUninit<u8> output buffers throughout the BLAKE3 implementation, avoiding unnecessary memory initialization and improving performance for output operations. All public APIs remain completely unchanged.

Motivation

When filling output buffers with hash data, Rust currently requires the output slice &mut [u8] to be fully initialized before writing. This means callers must zero-initialize buffers before passing them to BLAKE3, even though BLAKE3 will immediately overwrite those values

This was previously proposed in #154 but was never merged. This PR provides a comprehensive internal implementation while maintaining complete API compatibility.

Changes

Public API

No changes to public API - all existing methods work exactly as before with identical signatures

Changes

  • Added: Public OutputReader::fill_uninit() method accepting &mut [MaybeUninit<u8>]
  • Added: Internal Platform::xof_many_uninit() for uninitialized buffers
  • Added: Internal ffi_avx512::xof_many_uninit() for AVX-512 path
  • Modified: fill_one_block() now works with MaybeUninit<u8> internally

biryukovmaxim avatar Nov 12 '25 20:11 biryukovmaxim

Do you have a specific use case in mind for this optimization? In my imagination, the most performance-sensitive callers of the XOF are repeatedly filling a buffer full of random bytes, and in that case the cost of zeroing it is only paid once.

Also this is a spicy question, but couldn't a performance sensitive caller mem::transmute (or otherwise unsafely synthesize) a &mut [u8] over their uninitialized bytes, and then call the existing API knowing it will only write and never read? Last I heard this was an ongoing debate in the Rust memory model, but for example this doesn't currently fail Miri:

use std::mem::MaybeUninit;

fn main() {
    let mut buf: MaybeUninit<[u8; 1024]> = MaybeUninit::uninit();
    let array: &mut [u8] = unsafe { buf.assume_init_mut() };
    for byte in array.iter_mut() {
        *byte = 99;
    }
    for byte in array {
        assert_eq!(*byte, 99);
    }
}

(In other words, could we provide functionally the same capability by adding a line to the docs that says "we promise this function won't read the buffer; unsafe code may rely on that".)

oconnor663 avatar Nov 12 '25 22:11 oconnor663

Do you have a specific use case in mind for this optimization? In my imagination, the most performance-sensitive callers of the XOF are repeatedly filling a buffer full of random bytes, and in that case the cost of zeroing it is only paid once.

yes, you are right. Benches on my machine confirms no perf gain

However, there is no sound way to fill uninit buffer with current API

.assume_init_mut() cannot be used to initialize a MaybeUninit. Calling this when the content is not yet fully initialized causes immediate undefined behavior "Creating a reference to uninitialized data is immediate undefined behavior, even if the reference is never read."

but since such optimization doesn't make sense I'm okay to actually close the PR.

another note: when features read_buf and core_io_borrowed_buf stabilize, std::io::Read will support it natively via code like:

let mut output = Vec::with_capacity(OUTPUT_SIZE);
let mut buf = BorrowedBuf::from(output.spare_capacity_mut());
let mut hasher = blake3::Hasher::new();
hasher.update(INPUT_DATA);
hasher.finalize_xof().read_buf(buf.unfilled()).unwrap();
unsafe { output.set_len(OUTPUT_SIZE) };

and it wont require any changes of blake3

biryukovmaxim avatar Nov 13 '25 10:11 biryukovmaxim

"Creating a reference to uninitialized data is immediate undefined behavior, even if the reference is never read."

I'm curious where that quote comes from, and I can't find a source. Here's the closest thing I can find to an authoritative opinion on this, from 2023:

The status of reference to uninit memory is undecided. We document them as UB in the reference so that we can make this decision without code already relying on an outcome. Miri does not flag this UB because we are not sure if we really want to rule out all that code. The compiler does not actually make them UB and the standard library can rely on that, but user code cannot...My own position is that this should not be UB

Emphasis mine. So I shouldn't be telling anyone to do this (and the MaybeUninit docs are specifically saying don't do this), because the standard could move against it. But if Ralf Jung says he wants it to be legal... :)

oconnor663 avatar Nov 14 '25 02:11 oconnor663