simdutf8
simdutf8 copied to clipboard
Add streaming API which works with the basic and compat APIs
Currently only full slices can be validated using the basic API. Using a streaming API with init(), update(), finish_validation() functions validation could be done on the fly.
With the compat API this can currently be awkwardly emulated by remembering how far the given slice is valid using the Utf8Error::valid_up_to() method.
This is partially implemented in v0.1.3 as low-level API in simdutf8::basic::imp.
Still missing:
compatAPI with early validation failure and exact error information.- Safe API with implementation auto-selection.
Question: would it be possible to implement a transparent wrapper around BufReader with such an API? I'm thinking about using it for quick-xml, which I think it would be well suited for. Essentially:
- Raw bytes get progressively validated as UTF-8 as they stream out of the
BufReader - The XML parsing operates on raw bytes, searching for the standard characters such as
<and> - Knowing that the input is valid UTF-8, we can safely use
std::str::from_utf8_unchecked()as needed so long as we use the known character boundaries from parsing