Suggestion: Add an alternative function to output as UTF-16LE `&[u8]` slice
In the Windows world UTF-16 strings are not only encountered when interfacing with APIs, but also in a few on-disk structures (e.g. NT registry hives or NTFS filesystems).
This complicates interoperability with Rust's UTF-8 world, especially in no_std environments.
My current approach when writing a parser for such an on-disk structure is as follows:
- I define my own
Utf16ByteStringtype that just wraps a&[u8]. - All parser functions that output a string just return the byte slice encompassing that string in a
Utf16ByteString. This has zero cost. - For users with
allocorstd, myUtf16ByteStringprovides ato_stringfunction that useschar::decode_utf16(bytes.chunks_exact(2).map(|two_bytes| u16::from_le_bytes(two_bytes.try_into().unwrap())))internally. Apart from the required allocations, this function also comes with decoding overhead.
Of course, I like to avoid using to_string, and a frequent case where this should be possible are (case-sensitive) comparisons.
Currently, I have to create the comparison byte buffers by hand though, e.g. let hello = &[b'H', 0, b'e', 0, b'l', 0, b'l', 0, b'o', 0].
Latest const-utf16 is no help here, as its encode! only outputs a &[u16]. I could transmute my &[u8] to a &[u16], but that would be an unsafe hack and prone to endian problems.
Could const-utf16 therefore be extended to alternatively output a UTF-16LE &[u8] slice for such comparisons?
Or am I missing a zero-cost alternative here?
Hmmmm... I have to think a bit about this. The best possibility would be a safe way to convert &[u16] to &[u8]. Hopefully, Rust will have this capability someday.
I think this could could do what you want:
use std::convert::TryInto;
fn compare(u16_slice: &[u16], u8_slice: &[u8]) -> bool {
u16_slice.len() * 2 == u8_slice.len()
&& u16_slice.iter().copied().eq(u8_slice
.chunks_exact(2)
.map(|two_bytes| u16::from_le_bytes(two_bytes.try_into().unwrap())))
}
This is fairly low cost and does what you want.