heed icon indicating copy to clipboard operation
heed copied to clipboard

Reduce allocations in `BytesEncode`

Open nolanderc opened this issue 2 months ago • 1 comments

The BytesEncode trait is currently defined as follows:

https://github.com/meilisearch/heed/blob/34063834019b41740b684a5aa3ebef8f872579f4/heed-traits/src/lib.rs#L20-L26

There are some issues with this definition:

  • Unless the encoded type can be trivially transmuted to a slice of bytes (str, [u8], U32<NativeEndian>, ...) we have to allocate a temporary buffer where the value is written.
  • There is no way to get a temporary buffer without going through the global allocator.
  • Allocating a temporary buffer incurs an extra copy.

For example, most architectures in use are Little-Endian (x86, Arm), however, for lexicographic ordering to correctly sort integers, we have to store them in Big-Endian. Since this incurs a byte-swap on Little-Endian architectures we cannot simply return a pointer to the integer itself, instead we have to allocate a temporary buffer where we store the byte-swapped result. That's one heap allocation for each integer, something which could have been done trivially on the stack or in a CPU register.

Optimizations

In most cases we know a-priori the maximum size of the needed allocation:

  • LMDB forces keys to be 511 bytes or less.
  • Primitive types and arrays have a compile-time known size.

In some cases it may be preferable to run serialization in two phases:

  • First we serialize to a "null" writer which counts the number of bytes needed for the value.
  • With this size in hand, we can have LMDB reserve a memory area in the database which fits the serialized value (using MDB_RESERVE).
  • We then serialize our value directly into the reserved space.

A new trait

With the above in mind, we can make a few changes to the BytesEncode trait (naming TBD):

/// Set to `true` if the value can be trivially casted to `&[u8]` (e.g., `str`, `[u8]`, ...).
/// This is used as a hint by the library to avoid using intermediate buffers.
const ZERO_COPY: bool = false;

/// A hint to the database about the final size of the encoded value.
/// Can be used to reduce allocations and memory copies.
/// 
/// Must be exactly equal to the final encoded size,
/// otherwise encoding will result in an error.
fn size_hint(item: &Self::EItem) -> Option<usize> {
    None
}

/// Encode the value into a pre-allocated buffer.
/// The buffer could be allocated on the stack if we are encoding keys (511 bytes),
/// or allocated on the heap (with an initial capacity equal to `size_hint`).
fn encode_writer<W: std::io::Write>(item: &Self::EItem, writer: &mut W) -> Result<()> {
    // the default impl forwards to the old method
    writer.write_all(Self::bytes_encode(item)?)?;
    Ok(())
}

Based on the value of ZERO_COPY we can decide between either calling bytes_encode and directly get our slice, or using a pre-allocated buffer with encode_writer. (If the )

We could also consider making encode_writer the required method, and define bytes_encode in terms of encode_writer with Vec as our Writer. However, this would be a breaking change.

Example Usage

Let's look at how we might implement Database::put with the above definition (pseudocode):

fn put(db, key: impl BytesEncode, value: impl BytesEncode) {
    if Key::ZERO_COPY {
        key_bytes = key.encode_bytes()?; // should yield &[u8] without allocation
    } else {
        let mut buffer = [0u8; 511];
        let mut array_writer = std::io::Cursor::new(&mut buffer);
        key.encode_writer(&mut array_writer);
        key_bytes = &buffer[..array_writer.position()];
    }

    if !Value::ZERO_COPY {
        if let Some(size) = value.size_hint() {
            // allocate space directly in DB for value before writing
            return db.put_reserve(key, size, |space| value.encode_writer(space));
        }
    }

    // Should yield `&[u8]` directly if ZERO_COPY is true.
    // Otherwise, the implementation likely makes better decisions about allocating a `Vec`.
    // Although, we could possibly pre-allocate a small buffer on the stack, something the implementation cannot.
    value_bytes = key.encode_bytes()?;

    mdb_put(db, key_bytes, value_bytes)?;
}

Note that we can avoid all temporary allocations and intermediate memory copies in all cases except one: if the resulting size of value is unknown.

nolanderc avatar Apr 29 '24 21:04 nolanderc