hematite_nbt
hematite_nbt copied to clipboard
add `nbt::from_slice` and zero copy support for strings
This pr adds nbt::from_slice
to enable zero copy deserialization for strings. The implementation is inspired by serde_json
which uses a trait to abstract over multiple sources.
All of the read_*
functions in raw.rs
got moved into a trait. This trait is implemented for std::io::Read
and &[u8]
.
The read_bare_string
function was modified to also take a scratch buffer and return a borrowed, copied or owned string. In a lot of cases this enables reading a string without allocating. This is useful beyond zero copy as it's also used for identifiers in structs. This optimization is not used for Value
and Blob
as they always allocate anyways.
Even though zero support is added Cow<'de, str>
has to be used because modified utf-8 does always allow zero copy.
Example:
#[derive(Deserialize)]
#[serde(rename_all = "PascalCase")]
pub struct Section<'a> {
#[serde(serialize_with = "nbt::i64_array", default)]
pub block_states: Option<Vec<i64>>,
#[serde(borrow)]
pub palette: Option<Vec<Block<'a>>>,
pub y: i8,
}
#[derive(Deserialize, Hash)]
#[serde(rename_all = "PascalCase")]
pub struct Block<'a> {
#[serde(borrow)]
pub name: Cow<'a, str>,
#[serde(default, borrow)]
pub properties: Option<BTreeMap<Cow<'a, str>, Cow<'a, str>>>,
}
I had some trouble getting reliable results from the benchmarks, but there usually were only small regressions of up to 5% and improvements of up to 10-20%. (The benchmarks were not modified, I suspect all the speed up was from prevent allocations for identifiers in structs). I got varying but mostly positive results in my personal project for parsing anvil chunks with zero copy.
I'm not sure I'm 100% following the changes, but it looks like internally you use a Cursor
to wrap the byte slice. Since Cursor
implements io::Read
, wouldn't it be possible to implement from_slice()
by wrapping the byte slice in a Cursor at the top level and passing that to from_reader()
? That could greatly simplify the changes needed. Or am I missing something?
They are very similar but not quite the same: read_bare_string
is different because it needs access to the slice to borrow from it.