temporal Parsers should support unvalidated UTF16

Currnetly the parsers, like Instant::from_str() requires validated UTF8 str.

JS engines typically have UTF16 or unvalidated UTF8.

It would be nice if these parsers were written to consume &[u8] (most of the parsing is ASCII only anyway), so we could at least operate on unvalidated UTF8 and have from_utf8_bytes() functions.

Ideally we also have UTF16 functions. That would need a tweak to the ixdtf parser.

May 01 '25 18:05 Manishearth

cc @nekevss

May 01 '25 18:05 Manishearth

A thing I may do over capi is adding from_str_utf8() from_str_utf16() that take DiplomatStr and DiplomatStr16, converting internally. Then over time we can make optimizations to avoid conversions/checking.

May 01 '25 18:05 Manishearth

Parsing unvalidated utf8 has been implemented in #295

May 10 '25 18:05 HalidOdat

With unicode-org/icu4x#6577 merged, implementing UTF16 support should mostly be unblocked.

May 28 '25 16:05 nekevss

Added a mention of Latin-1 to the issue.

Jul 01 '25 15:07 Manishearth

So are you thinking that instead of from_utf8, we rename to from_latin1?

I was sort of leaning towards keeping from_utf8 since the validation for values is ASCII anyways and would cause less confusion from the native Rust side of things.

But I'm also open to other alternatives:

from_bytes
from_latin1
from_ascii_bytes
from_utf8 (same)

Jul 01 '25 15:07 nekevss

So are you thinking that instead of from_utf8, we rename to from_latin1?

No, we'd have both, like ICU4X.

But yes, since it's all ASCII anyway, the point might be moot.

Jul 01 '25 16:07 Manishearth

With #365 merged, this implementation is no longer blocked by ixdtf.

Jul 04 '25 02:07 nekevss