ibis
ibis copied to clipboard
feat: add support for ibis.date(y, m, d) using deferred
ibis.date errors out if you refer to parent expressions using _.
Expected: can pass _['year'], _['month'], _['day'] to ibis.date to get a date expressions, just as you would t['year'], t['month'], t['day']
Actual: type error
MWE:
import ibis
from ibis import _
import pandas as pd
cols = ['date_id', 'date_year', 'date_month', 'date_day']
vals = [[1, 2021, 8, 4], [2, 2021, 8, 26], [3, 2022, 8, 3], [4, 2022, 8, 25]]
df = pd.DataFrame(vals, columns=cols)
conn = ibis.pandas.connect({'dates': df})
dates_base = conn.table("dates")
# Works
dates_works = (
dates_base
.mutate(date_value=ibis.date(dates_base['date_year'], dates_base['date_month'], dates_base['date_day']))
)
# Does not works
dates_notworks = (
dates_base
.mutate(date_value=ibis.date(_['date_year'], _['date_month'], _['date_day']))
)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [1], line 22
14 dates_works = (
15 dates_base
16 .mutate(date_value=ibis.date(dates_base['date_year'], dates_base['date_month'], dates_base['date_day']))
17 )
19 # Does not works
20 dates_notworks = (
21 dates_base
---> 22 .mutate(date_value=ibis.date(_['date_year'], _['date_month'], _['date_day']))
23 )
File /usr/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
885 if not args:
886 raise TypeError(f'{funcname} requires at least '
887 '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
TypeError: _date_from_deferred() takes 1 positional argument but 3 were given
Ah, so this is only implemented for the use case of ibis.date(thing) -> thing.date(), extracting the date from a non-date column. We haven't implemented the variadic case of date. This would be a new feature.
This is a tricky, but possibly fun issue to address. Deferred instances are not supported as inputs to ops.Nodes because ops.Nodes eagerly check their inputs' type and Deferred instances' types are not known until their .resolve method is called.
It's not clear to me exactly how to make these two things--Deferred and ops.Node behavior--work well together.
@cpcloud Postgres works in some cases. Using the same data uploaded to postgres:
dates_base = conn.table("dates")
dates = (
dates_base
.mutate(date_value=ibis.date(dates_base['date_year'], dates_base['date_month'], dates_base['date_day']))
)
yields date_value with the correct date value
xref #4382
Closing in favor of #4382.