simdutf8
simdutf8 copied to clipboard
SIMD-accelerated UTF-8 validation for Rust.
[stdsimd](https://github.com/rust-lang/stdsimd) seems to be still in the early stages but we could check if all required primitives are supported. Mainly the algorithm requires a dynamic shuffle which is not available...
See [`core::arch::wasm32`](https://doc.rust-lang.org/stable/core/arch/wasm32/index.html).
As this is already implemented as part of the UTF-8 code it could be easily exposed as a function.
The pointer is actually unaligned, so this needs to be an unaligned read as otherwise UB gets invoked. A debug build nowadays panics here to detect the UB.
Currently there is no safe relpacement for `String::from_utf8` in simdutf8. I think it is easy to add a function for this.
I've wanted chunked UTF-8 decoding twice recently for different escaping routines, and have used `simdutf8::compat::from_utf8` in a loop to achieve that. I would really like to be able to use...
This adds a core::simd (rusts portable simd) implementation, since that is a nightly-only feature, it is guarded behind the `portable` feature flag.
Why do I get only about `12 GB/s` with this manual benchmark? ```rust use std::time::Instant; use simdutf8::basic::from_utf8; fn main() { let mut vec: Vec = Vec::new(); for i in 0..1024...