time
time copied to clipboard
More feature-full format_description macro
I'm parsing a rfc3339-ish format with some flexibility (and UTC/Local offset selected out of band). I've ended up with the following incantation:
use time::{
format_description::{modifier::*, Component, FormatItem::*},
parsing::Parsed,
};
let mut p = Parsed::new()
.with_hour_24(0)
.unwrap()
.with_minute(0)
.unwrap()
.with_second(0)
.unwrap()
.with_offset_hour(input_offset) // Work in progress ;)
.unwrap();
let rest = p.parse_items(
input_str.as_bytes(),
&[
Component(Component::Year(Year::default())),
Literal(b"-"),
Component(Component::Month(Month::default())),
Literal(b"-"),
Component(Component::Day(Day::default())),
Optional(&Compound(&[
First(&[Literal(b"T"), Literal(b" ")]),
Component(Component::Hour(Hour::default())),
Literal(b":"),
Component(Component::Minute(Minute::default())),
Optional(&Compound(&[
Literal(b":"),
Component(Component::Second(Second::default())),
])),
])),
],
)?;
That does the job, but I'd prefer using time::macros::format_description
for readability (and maybe compile-time goodness, I'm not sure). But AFAICT it's missing support for optional
, first
, and compound
.
Would it be possible to extend the grammar ?
Well…Compound
is the whole thing and is already what's output from the macro. Optional
and First
are the ones without syntactical support. I'm open to any ideas you may have on how to extend the grammar. I've thought about it previously and couldn't come up with a decent solution.
AFAIU Compound
is only supported at the root ?
What's needed is a form of nesting. Something like [compound [...]]
, [optional [...]]
and [first [...]]
, with each ...
being recursively parsed like the current top-level expression. So my initial example would translate to
[year]-[month]-[day][optional [[first [T ]][hour]:[minute][optional [:[second]]]]]
Note that when recursing, whitespace inside brackets is significant, otherwise [first [T ]]
wouldn't work as expected.
This can get a bit verbose, so [compound [...]]
/[optional [...]]
/[first [...]]
could be aliased to [*[...]]
/[?[...]]
/[![...]]
or*[...]
/?[...]
/![...]
or even <...>
/(...)
/{...}
if making the grammar more sigil-heavy is acceptable and if we're confident we won't have more wrapper types :
[year]-[month]-[day][?[[![T ]][hour]:[minute][?[:[second]]]]]
or
[year]-[month]-[day]?[![T ][hour]:[minute]?[:[second]]]
or
[year]-[month]-[day]({T }[hour]:[minute](:[second]))
Actually, looking at that last example I realized that I initially tough optional
took a list of formatitems instead of just one.
- This looks like an assumption that simplifies the grammar nicely, so
(...)
should translate toOptional(&Compound(&[...]))
on the rust side. - It'd be nice to have
Optional(&[...])
supported on the rust side directly but it's not worth a compat break. Maybe a newOptionals
enum variant ? - We still need grammar-level support for
compound
, becausefirst
might want to choose between multiple compounds.
AFAIU
Compound
is only supported at the root?
Compound
can be arbitrarily nested. I checked and Compound
isn't what's actually output by the macro — it's just a slice reference to FormatItem
s.
Note that when recursing, whitespace inside brackets is significant, otherwise
[first [T ]]
wouldn't work as expected.
You're assuming literals will always be one character. This certainly isn't the case. The only way around this, realistically, is to have it be quoted. This isn't ideal because now we're forcing people to either escape the quotes or use raw strings. We'd also have to handle escaping within the literal ourselves.
It'd be nice to have
Optional(&[...])
supported on the rust side directly but it's not worth a compat break. Maybe a newOptionals
enum variant?
Optional(Compound(…))
works currently. The parser could automatically output this as needed.
I checked and Compound isn't what's actually output by the macro — it's just a slice reference to FormatItems.
Yes that's what I meant, Compound
can be nested but only in code, not in grammar.
You're assuming literals will always be one character. This certainly isn't the case. The only way around this, realistically, is to have it be quoted. This isn't ideal because now we're forcing people to either escape the quotes or use raw strings. We'd also have to handle escaping within the literal ourselves.
I did think about multi-character :) The "significant whitespace" rule just avoids some corner-cases and makes recursion easier.
First
is the only problematic wrapper regarding multi-char, my idea was that they would be supported via compound: [first [[compound [h: [hour]]][compound [m: [minute]]]]
. Other ways to handle this could be to accept multiple arguments: [first [h: [hour]] [m: [minute]] ]
or to use adjacency: [first [h: [hour]]][first [m: [minute]]]
(which also works with sigil syntax {h: [hour]}{m: [minute]}
). I think the multi-bracket [first [...] [...] [...]]
version is the cleanest and most intuitive.
The only in-litteral escaping needed is for [
(and (
/{
/<
if we decide to use them), which is already the case of the existing grammar.
I'm not sure how you feel about adding sigils to the grammar ? They enable shorter descriptions and look nicer / more readable, but they make the grammar more complex, aren't as future-proof and don't seem to work well for multi-bracket first
.
Where would a user need to explicitly state that something is Compound
? First
takes a slice, and Optional
is semantically only a single item regardless — a Compound
would be generated from the inner contents as needed without a user needing to specify it.
I think that if you want to have either a T
or a space, [first [T] [ ]]
is reasonable syntax, rather than combining the two possibilities in a single item (as [first [T ]]
).
Anything outside of brackets is a no-go for compatibility reasons. That includes using parenthesis, braces, and angle brackets, as well as prefix or postfix sigils outside of brackets. I also don't think sigils are more readable — the current format description arose because the previous version used too many without clear meaning or purpose.
My initial suggestions needed an explicit [compound [ ]]
but you're right : the latest one doesn't, cool. So to recap:
- Add a
[optional [abc...]]
grammar item that maps toOptional(&Compound(&[a,b,c...]))
- Add a
[first [a] [bc] ...]
grammar item that maps toFirst(&[Compound(&[a]),Compound(&[b,c])...])
- Parser could optimize single-item
Compound(&[a])
into justa
- When recursing, whitespace is significant so that
[ ]
is a literal space
That is my understanding as well. Note that the output from the parser isn't guaranteed, only the behavior, so the optimization noted is not mandatory.
For the recursive part, I still need to think about escaping. The left bracket is simple — double it as is the case for top-level items. The right bracket does not have this capability right now, as the grammar was designed without thinking of nesting. I will need to investigate whether it is possible to permit escaping the right bracket by doubling it in a backwards-compatible manner. My immediate concern is that consecutive right brackets will be quite common, and forcing a space between them isn't ideal.
Somehow I never considered this before now, but in order to output these values from the runtime parser, owned format descriptions (#429) are necessary, as I can't emit a slice to a newly-created object. With that said, though, the implementation isn't too difficult now that I've rewritten things.
Mixed news on this front. I was able to implement option items fully with the aforementioned syntax, such that this test passes:
assert_eq!(
format_description::parse_owned("[optional [:[year]]]"),
Ok(OwnedFormatItem::Compound(Box::new([
OwnedFormatItem::Optional(Box::new(OwnedFormatItem::Compound(Box::new([
OwnedFormatItem::Literal(Box::new(*b":")),
OwnedFormatItem::Component(Component::Year(Default::default()))
]))))
])))
);
Unfortunately, removing the colon results in something unexpected (at first glance). Because [[
is treated as an escape sequence, it gets lexed differently. This affects the "depth" of the lex, which is what determines if something is part of a component or a literal.
I might be able to work around this by carving out an edge case for nested format descriptions specifically ("optional" and "first"), but I will need to be careful in doing so. I still need to think about how to permit a literal ]
without closing the sequence as well.
Edit: I've successfully worked around that issue. The only outstanding issue for parsing optional items is the escaping of ]
.
This has been implemented for the runtime parser on main. The escape character for a right bracket is a backslash. The only remaining step (implementation-wise) is to port it to the macro parser, which should be relatively simple. Then I need to actually document this…
It doesn't matter publicly, but the optimization of Compound(&[a])
to a
was actually very straightforward to implement, so I've done so.
@vincentdephily Please give this a test run! If you run into any bugs, let me know.
It works, thanks. I only tested my specific format and [
/]
escaping, nothing fancy. Some observations:
- Having two different escape chars for
[
and]
feels inconsistent. I'd rather use a backslash everywhere but I understand this might be a breaking change. - Why are
first
/optional
restricted toparse_owned()
? I would have thought thatparse()
'sVec<FormatItem<'_>>
could handle nested FormatItems just as well ? - Needs the compile-time part of the work, as you mentioned.
format_description!()
is described as compile-timeparse()
rather thanparse_owned()
, I hope it doesn't run into the same problem.
- I probably could use
]]
as an escape sequence, but it would be frustrating to deal with due to the otherwise unnecessary whitespace. This is quite evident, even in your test, as you would need to add spaces add the end. This might have semantic meaning too; I didn't think past the unnecessary whitespace issue. With regard to consistency, the only option here within time 0.3 is using the backslash to escape the opening bracket when in a nested format description. But then there's an inconsistency between the top-level description and nested format descriptions. -
FormatItem
stores a reference to a slice/item. Any references that exist would necessarily have to have existed before the call toparse
. This is fine for the static case (including macro-generated), as the references are promoted to local consts by the compiler. - Absolutely! I haven't updated documentation fully for
parse_owned
. The new items will be supported by the macro, emitting the slices you're currently using.
It's not pretty, but we can probably live with those escapes for now. I find [[
and ]]
hard to read and easy to mistype, especially with arbitrary nesting. A backward-compatible way to get \
everywhere would be to create new functions, but it's probably not worth it. Are you interested in opening "0.4 compat-break wishlist" issues ?
I'll give the updated docs and macro a look when they're ready.
The reason [[
was originally chose is because Rust uses {{
to perform similar escaping in format strings. At that time, I did not consider that nested descriptions could ever be a thing (Optional
and First
didn't even exist then).
You did give me a thought, though. As there is parse
and parse_owned
, a new parse_borrowed
could be introduced that uses a backslash escape. parse
could then maintain the current behavior, possible being deprecated. It's not the prettiest solution, but it would work.
It would be ideal if Rust supported default type parameters for functions, as I could then have marker structs to indicate the desired behavior (current, borrowed, or owned). This is not the case, however.
Right now I think the wishlist for 0.4 is very small, so an issue isn't necessary.
Alright, round two! I have pushed a change that introduces backslash for escapes everywhere. You should be able to do \\
, \[
, and \]
, resulting in a literal \
, [
, or ]
respectively. This is the case regardless of if you're in a nested description.
For backwards compatibility, I have made this change by introducing versioning internally. parse
uses version 1, while the new parse_borrowed
and parse_owned
use version 2. For the macro, I'm not sure whether I want to pass the version as an argument or introduce another macro. What are your thoughts?
I have just pushed a commit that permits "optional" and "first" items in the format_description!
macro. To use the backslash escapes, you will need to call it like format_description!(v2, "foo")
. When omitted, the default is v1 for backwards compatibility. Note that optional and first items can be created when using v1, but they will use the older [[
escape. For most users, this likely will not matter.
The commit also fixes a bug where [[
was an escape sequence even in parse_borrowed
and parse_owned
.
The only remaining implementation issue is being able to set the version when using time::serde::format_description!
. As the parsing capability already exists, I'm hoping to have this done relatively quickly. Then it's just documentation and a release! If anyone wants to write documentation, a release will come quicker as a result :slightly_smiling_face:
Alright. I've changed it from v1
to version = 1
and permitted it for the serde::format_description
macro. While not strictly necessary, it avoids the need for significant (though not unbounded) lookahead.
Please give it a test! The only remaining issue is documentation.
This has been released as part of time
0.3.18`. Documentation is available along with the rest of the docs for format descriptions here.
It seems time-macros
need a version bump to bring version = 2
in. Right? @jhpratt
Whoops! I neglected to publish time-macros
. As versions are pinned, I'll need to push another version of time
as well. It will be done tonight.
CI will do its thing, but I have just released it.