Parsers should support unvalidated UTF16
Currnetly the parsers, like Instant::from_str() requires validated UTF8 str.
JS engines typically have UTF16 or unvalidated UTF8.
It would be nice if these parsers were written to consume &[u8] (most of the parsing is ASCII only anyway), so we could at least operate on unvalidated UTF8 and have from_utf8_bytes() functions.
Ideally we also have UTF16 functions. That would need a tweak to the ixdtf parser.
cc @nekevss
A thing I may do over capi is adding from_str_utf8() from_str_utf16() that take DiplomatStr and DiplomatStr16, converting internally. Then over time we can make optimizations to avoid conversions/checking.
Parsing unvalidated utf8 has been implemented in #295
With unicode-org/icu4x#6577 merged, implementing UTF16 support should mostly be unblocked.
Added a mention of Latin-1 to the issue.
So are you thinking that instead of from_utf8, we rename to from_latin1?
I was sort of leaning towards keeping from_utf8 since the validation for values is ASCII anyways and would cause less confusion from the native Rust side of things.
But I'm also open to other alternatives:
- from_bytes
- from_latin1
- from_ascii_bytes
- from_utf8 (same)
So are you thinking that instead of from_utf8, we rename to from_latin1?
No, we'd have both, like ICU4X.
But yes, since it's all ASCII anyway, the point might be moot.
With #365 merged, this implementation is no longer blocked by ixdtf.