icu4x
icu4x copied to clipboard
Add Sedate IXDTF (extended ISO) string parsing function
We should add a function that parses ISO-8601 strings. We should follow the Temporal grammar:
https://tc39.es/proposal-temporal/#sec-temporal-iso8601grammar
i.e., we should implement a string parser conforming to IETF Sedate that parses a string into its components (date, time, time zone, calendar), and use this parser in APIs and anywhere else we parse date strings. Think about where to put this code in the crate structure.
We should start with the DateTime production, and work our way up towards CalendarDateTime and TemporalZonedDateTimeString.
Examples of strings that are accepted as part of the DateTime grammar:
2022-06-27T16:31:00.0002022-06-27t16:31:00.0002022-06-27 16:31:00.00020220627 16:31:00.00020220627 163100.00020220627 16310020220627 1631
Note: currently we have https://unicode-org.github.io/icu4x-docs/doc/icu_datetime/mock/fn.parse_gregorian_from_str.html
We should start with a design doc for this. We likely want to end up with a parser type named something like IetfDateTimeStringParser that is able to support the whole grammar.
Question: do we want to have specific, stricter parsers for each individual type, as well as the fully-featured one conforming to IETF Sedate?
We need to design how the parser will be architected so that it can be easily implemented.
Discuss with:
- @zbraniecki
- @skius
- @sffc
- @younies
- @Manishearth
Optional:
- @robertbastian
Do we want to leverage an existing implementation, such as the time crate?
Hi! FWIW, we have an initial parsing implementation in temporal_rs. Although, it admittedly needs more testing and clean up.
Do we want to leverage an existing implementation, such as the
timecrate?
Open to doing so with a Cargo feature.
Though I think this is the kind of thing that is okay for us to expose if we wish.
I think a SEDATE parser should be its own crate. Maybe it lives in our repo, maybe in the temporal_rs repo, maybe but maybe not in the time or chrono repo.
I changed the title of this issue to emphasize that we're talking about IXDTF, which is not functionality I'm aware of in Rust except for temporal_rs mentioned above.
Just to expand on my above comment. From what I can tell after reading through the IETF doc, I think the parser in temporal_rs can parse Sedate IXDTF or, at least, is fairly close to it. There might be a couple things missing, and the API definitely needs some work if the decision were to use that parser.
I'm pretty neutral on whether the functionality lives in icu4x, temporal_rs, or its own crate. I mostly figured that I would mention that it exists if there was interest in using the currently existing one as a base or just cleaning it up and using what's there. 😄
For reference, here are all the tests for that parser.
Thanks for the reference! Looking over that code, it seems to be well-written and suitable as a dependency of the icu4x project. I think it would be worthwhile pulling this code into an ixdtf crate (it looks like we are currently squatting that crate name). Seems harmless to include the Duration parsing code as well. The parser should return the raw IsoParseRecord type so that it isn't specific to the types in either icu_calendar or temporal_rs.
Sounds good. I'll move the code over to the directory. update it with the above comments., and submit a PR
Discussion: after @nekevss's PR is landed, we will add ixdtf as an optional, default dependency of icu_calendar and use where necessary.
LGTM: @sffc @Manishearth @robertbastian
#4646 basically fixes this issue but there were some minor follow-ups. I'm re-opening this issue to track that, or we can move those to another issue and re-close this one.
I can definitely work on the follow ups for the ixdtf as needed. I think it makes sense to have a separate issue to track progress on any follow ups needed on ixdtf.