dateutils icon indicating copy to clipboard operation
dateutils copied to clipboard

A formatter to match named timezones or their abbreviations, such as GMT

Open Earnestly opened this issue 3 years ago • 6 comments

I have a few date strings which contain the named zones GMT and UTC. Would it be possible for %Z to match named timezones or abbreviations as strftime does?

I realise I could repeat the -i flags with both UTC and GMT permuted among them, but that seems a bit awkward.

Earnestly avatar Feb 15 '22 11:02 Earnestly

Hi, what do you mean by match? Just eat the string, or actually evaluate the string so it acts as --from-zone?

hroptatyr avatar Feb 15 '22 12:02 hroptatyr

In my case I have <pubDate>1 Feb 2022 12:00:00 GMT</pubDate> where GMT ultimately means UTC and so both could be eschewed.

But your suggestion of --from-zone is probably more appropriate, what people might expect. The matching I'm refering to is via -i '...', currently I repeat the date formats:

    ...
    -i '<d>%a, %d %b %Y %T %Z</d>' \
    -i '<d>%a, %d %b %Y %T GMT</d>' \
    -i '<d>%a, %d %b %Y %T UTC</d>' \
    ...

Earnestly avatar Feb 15 '22 13:02 Earnestly

I see. I mean you could just not specify it, then (depending on the presence of -S) it will be part of the output again. Are you hoping to rewrite, say, BST to +00:00 or PRC to +08:00?

Using --from-zone you'd only be able to specify one zone for the entire input. What you suggested sounded like a "per-line" --from-zone.

hroptatyr avatar Feb 15 '22 13:02 hroptatyr

In this particular case I'm hoping to convert all timestamps to UTC0, rfc3339. Specifically %FT%TZ.

Many of the formats I deal with follow RFC822 to a degree with inclusion of zone offsets as either +0000 or +00:00 (where 0 is placeholder for actual variance).

Earnestly avatar Feb 15 '22 13:02 Earnestly

I see, the case with explicit zone offsets is easy (and already supported) because it's just a calculation.

Using zone names isn't too difficult either but I feel quite uncomfortable with the idea of (potentially) opening a new zone file for every line of input. Or, making it as comfortable as possible for the user, you'd have to open all zone files because those daylight saving names are inside the file. And you might need to have more diambiguation measures in place, e.g. for the famous AEST vs EST (Australia) vs EST (North America).

hroptatyr avatar Feb 15 '22 14:02 hroptatyr

That does sound fairly onerous certainly (zone files might be amenable to a hash table, but it is pretty large (5M on my system)).

I might just stick with my repeated -i in that case because it's not too terrible. Both UTC and GMT are ultimately UTC0 anyway, so when converting to UTC they can simply be removed.

Earnestly avatar Feb 15 '22 16:02 Earnestly