simdutf8 icon indicating copy to clipboard operation
simdutf8 copied to clipboard

Add streaming API which works with the basic and compat APIs

Open hkratz opened this issue 4 years ago • 2 comments

Currently only full slices can be validated using the basic API. Using a streaming API with init(), update(), finish_validation() functions validation could be done on the fly.

With the compat API this can currently be awkwardly emulated by remembering how far the given slice is valid using the Utf8Error::valid_up_to() method.

hkratz avatar Apr 26 '21 12:04 hkratz

This is partially implemented in v0.1.3 as low-level API in simdutf8::basic::imp.

Still missing:

  • compat API with early validation failure and exact error information.
  • Safe API with implementation auto-selection.

hkratz avatar May 14 '21 19:05 hkratz

Question: would it be possible to implement a transparent wrapper around BufReader with such an API? I'm thinking about using it for quick-xml, which I think it would be well suited for. Essentially:

  • Raw bytes get progressively validated as UTF-8 as they stream out of the BufReader
  • The XML parsing operates on raw bytes, searching for the standard characters such as < and >
  • Knowing that the input is valid UTF-8, we can safely use std::str::from_utf8_unchecked() as needed so long as we use the known character boundaries from parsing

dralley avatar Jul 13 '22 01:07 dralley