glaredb
glaredb copied to clipboard
Add postgres `to_date` function
Given the test data userdata1.parquet, I'd like to convert the birthdate' field to a Date` field.
> select birthdate from 'testdata/parquet/userdata1.parquet';
┌────────────┐
│ birthdate │
│ ── │
│ Utf8 │
╞════════════╡
│ 3/8/1971 │
│ 1/16/1968 │
│ 2/1/1960 │
│ 4/8/1997 │
│ │
│ 2/25/1983 │
│ 12/18/1987 │
│ 3/1/1962 │
│ 3/27/1992 │
│ 1/28/1997 │
│ … │
│ 6/8/1970 │
│ 7/18/2000 │
│ 2/16/1960 │
│ 1/5/1988 │
│ 7/30/1987 │
│ 7/8/1965 │
│ 4/22/1975 │
│ 5/1/1979 │
│ 10/9/1991 │
│ │
└────────────┘
1000 rows (20 shown)
I've tried
select birthdate::Date from 'testdata/parquet/userdata1.parquet';
select cast(birthdate as DATE) from 'testdata/parquet/userdata1.parquet';
The datafusion docs don't seem to have any builtin functions for parsing strings to dates
My understanding is that this ends up being pretty hard, and it looks like arrow/datafusion's date+timestamp formats play a little fast and loose with timezones/leapseconds/calendars/etc. (and it seems like you'd need to use/link chrono for this? it seems like datafusion uses this to create some timestamps internally but not for manipulating or parsing user data.
it seems reasonable that we'd need to write/provide function to perform this cast on our end, and probably ensure that users also pass a format (or a nice name for well known formats) and also use a library to handle the timezone things.)
we should be able use chrono's strptime to handle this. I believe polars str.to_date and str.to_datetime both use this under the hood.
strpttime seems good (though the semantic differences between this and posix date might be weird to document? also We maybe should also have a couple of constants/named helpers for RFC3339 and various well defined timestamp formats)
I ran into this. A work around is casting to date, which oddly enough did not work as expected in the dashboard, but did in the CLI.
We are also missing to_char, as a workaround.
See also: https://www.postgresql.org/docs/current/functions-formatting.html
Seems likely that this will be implemented upstream https://github.com/apache/arrow-datafusion/issues/9147