miller icon indicating copy to clipboard operation
miller copied to clipboard

Parsing a day of month not zero padded

Open ajfisher opened this issue 1 year ago • 5 comments

I'm trying to parse a date and reformat it into something a little bit more workable. Unfortunately I can't change the source system this gets exported from.

Example dates:

1/07/2022 22/10/2022

Day of month is the first value.

I'm using this strptime to parse the date: t = strptime($created_date, "%d/%m/%Y")

For 22/10/2022 this parses fine and then I can then use t in strftime to output the format I need. Unfortunately 1/07/2022 doesn't work and results in (error). I understand the reason as the library providing strptime expects the %d to be a zero-padded value (as it also does for %m - and which happens to be true in this case).

What I'd like to know though is whether there's a way to be able to fix my date so that strptime might be able to interpret it?

ajfisher avatar Aug 17 '23 04:08 ajfisher

I'm adding a note, not useful for you. This works using Miller 5.

echo x="1/07/2022" | mlr --ojson put '$y=strptime($x,"%d/%m/%Y")'

aborruso avatar Aug 17 '23 08:08 aborruso

I did think my way around the problem in the end but required a bit of a lateral approach:

put 'd = leftpad($created_date, 10, "0");
 t = strptime(d, "%d/%m/%Y");
 $ftm = strftime(t, "%F");'

Specifically handles the fact my dodgy dates are 1 character shorter than the rest and then leftpad it with a 0 when it is...

Not elegant but definitely does the job for the use case I outlined.

ajfisher avatar Aug 17 '23 08:08 ajfisher

Dear @johnkerl is this a bug?

If I run this in Miller 6

echo x="1/07/2022" | mlr --ojson put '$y=strptime($x,"%d/%m/%Y")'

I have

[
{
  "x": "1/07/2022",
  "y": (error)
}
]

aborruso avatar Aug 20 '23 19:08 aborruso

@aborruso strptime in Go is brittle, and is based on Go's date-parsing which I find also brittle. I'll look into how I can hack on strptime (which I have done before).

johnkerl avatar Aug 27 '23 04:08 johnkerl

I just hit this issue where both month and day could be single digits, and worked around it with gsub. Posting here in case this is useful to anyone else reading this thread.

# Find any single digits surrounded by non-word characters and prefix them with a 0
echo x="1/5/2023" | mlr --ojson put '$y=strptime(gsub($x, "\b(\d)\b", "0\1"),"%d/%m/%Y")'

gives

[
{
  "x": "1/5/2023",
  "y": 1682899200
}
]

ajesler-hatch avatar Apr 10 '24 23:04 ajesler-hatch