simdutf8 icon indicating copy to clipboard operation
simdutf8 copied to clipboard

Replacement for `String::from_utf8`

Open Nugine opened this issue 2 years ago • 4 comments

Currently there is no safe relpacement for String::from_utf8 in simdutf8. I think it is easy to add a function for this.

Nugine avatar Jan 15 '23 11:01 Nugine

That would be effectively the same as simdutf8::compat::from_utf8(value).and_then(|s| s.to_owned()), yes?

Note that there was some discussion in the past about putting it in the standard library directly: https://www.reddit.com/r/rust/comments/mvc6o5/incredibly_fast_utf8_validation/

dralley avatar Jul 27 '23 02:07 dralley

Thanks for the answer!

Nugine avatar Jul 27 '23 02:07 Nugine

Ah I forgot the original problem. String::from_utf8 converts Vec<u8> to String with validation. However, simdutf8 can check a slice but not a vec. You have to use String::from_utf8_unchecked to bypass an extra copy. So there's still no safe replacement for that.

Nugine avatar Jul 27 '23 02:07 Nugine

Looking into the implementation of from_utf8 this should be quite easy to add

#[inline]
pub fn from_utf8(input: &[u8]) -> Result<&str, Utf8Error> {
    unsafe {
        validate_utf8_basic(input)?;
        Ok(from_utf8_unchecked(input))
    }
}

and we just add

pub mod string {
    pub use super::*;
	#[inline]
	pub fn from_utf8(input: Vec<u8>) -> Result<String, Utf8Error> {
    	unsafe {
        	validate_utf8_basic(&input)?;
        	Ok(String::from_utf8_unchecked(input))
    	}
	}
}

vrtgs avatar Sep 16 '23 11:09 vrtgs