hematite_nbt icon indicating copy to clipboard operation
hematite_nbt copied to clipboard

add `nbt::from_slice` and zero copy support for strings

Open Freax13 opened this issue 3 years ago • 2 comments

This pr adds nbt::from_slice to enable zero copy deserialization for strings. The implementation is inspired by serde_json which uses a trait to abstract over multiple sources.

All of the read_* functions in raw.rs got moved into a trait. This trait is implemented for std::io::Read and &[u8].

The read_bare_string function was modified to also take a scratch buffer and return a borrowed, copied or owned string. In a lot of cases this enables reading a string without allocating. This is useful beyond zero copy as it's also used for identifiers in structs. This optimization is not used for Value and Blob as they always allocate anyways.

Even though zero support is added Cow<'de, str> has to be used because modified utf-8 does always allow zero copy.

Example:


#[derive(Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct Section<'a> {
    #[serde(serialize_with = "nbt::i64_array", default)]
    pub block_states: Option<Vec<i64>>,
    #[serde(borrow)]
    pub palette: Option<Vec<Block<'a>>>,
    pub y: i8,
}

#[derive(Deserialize, Hash)]
#[serde(rename_all = "PascalCase")]
pub struct Block<'a> {
    #[serde(borrow)]
    pub name: Cow<'a, str>,
    #[serde(default, borrow)]
    pub properties: Option<BTreeMap<Cow<'a, str>, Cow<'a, str>>>,
}

I had some trouble getting reliable results from the benchmarks, but there usually were only small regressions of up to 5% and improvements of up to 10-20%. (The benchmarks were not modified, I suspect all the speed up was from prevent allocations for identifiers in structs). I got varying but mostly positive results in my personal project for parsing anvil chunks with zero copy.

Freax13 avatar May 16 '21 15:05 Freax13

I'm not sure I'm 100% following the changes, but it looks like internally you use a Cursor to wrap the byte slice. Since Cursor implements io::Read, wouldn't it be possible to implement from_slice() by wrapping the byte slice in a Cursor at the top level and passing that to from_reader()? That could greatly simplify the changes needed. Or am I missing something?

atheriel avatar May 19 '21 02:05 atheriel

They are very similar but not quite the same: read_bare_string is different because it needs access to the slice to borrow from it.

Freax13 avatar May 19 '21 06:05 Freax13