glaredb icon indicating copy to clipboard operation
glaredb copied to clipboard

Add postgres `to_date` function

Open universalmind303 opened this issue 2 years ago • 5 comments

Given the test data userdata1.parquet, I'd like to convert the birthdate' field to a Date` field.

> select birthdate from 'testdata/parquet/userdata1.parquet';
┌────────────┐
│ birthdate  │
│ ──         │
│ Utf8       │
╞════════════╡
│ 3/8/1971   │
│ 1/16/1968  │
│ 2/1/1960   │
│ 4/8/1997   │
│            │
│ 2/25/1983  │
│ 12/18/1987 │
│ 3/1/1962   │
│ 3/27/1992  │
│ 1/28/1997  │
│ …          │
│ 6/8/1970   │
│ 7/18/2000  │
│ 2/16/1960  │
│ 1/5/1988   │
│ 7/30/1987  │
│ 7/8/1965   │
│ 4/22/1975  │
│ 5/1/1979   │
│ 10/9/1991  │
│            │
└────────────┘
 1000 rows (20 shown)

I've tried

select birthdate::Date from  'testdata/parquet/userdata1.parquet';
select cast(birthdate as DATE) from 'testdata/parquet/userdata1.parquet';

The datafusion docs don't seem to have any builtin functions for parsing strings to dates

universalmind303 avatar Sep 21 '23 15:09 universalmind303

My understanding is that this ends up being pretty hard, and it looks like arrow/datafusion's date+timestamp formats play a little fast and loose with timezones/leapseconds/calendars/etc. (and it seems like you'd need to use/link chrono for this? it seems like datafusion uses this to create some timestamps internally but not for manipulating or parsing user data.

it seems reasonable that we'd need to write/provide function to perform this cast on our end, and probably ensure that users also pass a format (or a nice name for well known formats) and also use a library to handle the timezone things.)

tychoish avatar Sep 21 '23 19:09 tychoish

we should be able use chrono's strptime to handle this. I believe polars str.to_date and str.to_datetime both use this under the hood.

universalmind303 avatar Sep 21 '23 19:09 universalmind303

strpttime seems good (though the semantic differences between this and posix date might be weird to document? also We maybe should also have a couple of constants/named helpers for RFC3339 and various well defined timestamp formats)

tychoish avatar Sep 25 '23 13:09 tychoish

I ran into this. A work around is casting to date, which oddly enough did not work as expected in the dashboard, but did in the CLI.

We are also missing to_char, as a workaround.

See also: https://www.postgresql.org/docs/current/functions-formatting.html

greyscaled avatar Dec 11 '23 20:12 greyscaled

Seems likely that this will be implemented upstream https://github.com/apache/arrow-datafusion/issues/9147

universalmind303 avatar Feb 07 '24 21:02 universalmind303