dateutils icon indicating copy to clipboard operation
dateutils copied to clipboard

Multiple fall back input formats for dateconv

Open Earnestly opened this issue 3 years ago • 3 comments

dateconv -S is particularly useful when used as a filter for a large amount of input. It would potentially be helpful when dealing with inputs that have a few known formats for dateconv to try each one in turn until it succeeds.

The alternative would be to execute dateconv (and perhaps strptime) for each line of input.

(The ultimate solution would be for dateconv to detect the format such as Date.parse from js or dateutils from python)

Earnestly avatar Feb 11 '22 23:02 Earnestly

Hi, thanks for the report. That's what -i|--input-format is for.

hroptatyr avatar Feb 14 '22 07:02 hroptatyr

Oh I'm silly, I did not read properly that -i could be given multiple times. I'll have to give this a try

Earnestly avatar Feb 14 '22 10:02 Earnestly

It doesn't appear to operate in a fallback manner, and attempts to apply each input format to every line instead of breaking after the first success.

I.e. given this input:

Sun, 26 Sep 2021 00:00:00 +1000 http://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html The Speed of Time

Currently dateconv will apply the %F input format to the url, which is fair enough as -S matches anything in the line.

% dateconv -Sf %FT%TZ -i %FT%T%Z -i '%a, %d %b %Y %T %Z' -i %FT%TZ -i '%d %b %Y %T %Z' -i %F
2021-09-25T14:00:00Z http://www.brendangregg.com/blog/2021-09-26T00:00:00Z/the-speed-of-time.html The Speed of Time

Ideally I would hope for something like this, where it breaks after the first success.

% dateconv -Sf %FT%TZ -i %FT%T%Z -i '%a, %d %b %Y %T %Z' -i %FT%TZ -i '%d %b %Y %T %Z' -i %F
2021-09-25T14:00:00Z http://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html The Speed of Time

But this is all heuristic and it seems like the only proper solution to this would be to support fields such as sort -k (and sort -t). Another might be to add "anchors" to the "general specs", so along with %n for newline, to perhaps have %a+ and %a- representing the regex anchors ^ and $.

To workaround this I've devised a scheme to ensure titles cannot contain tabs while inserting a tab between the date and the rest of the line. Then -i can include this tab via %t in the match and -f can re-insert the space. This seems to work consistently with my inputs.

Earnestly avatar Feb 14 '22 10:02 Earnestly