ibis
ibis copied to clipboard
bug: error when handling empty / undefined struct
What happened?
We have a user on Zulip who was trying to make a memtable out of something like the following (representative of an upstream data source):
ibis.memtable(
[
{
'year': '2024',
'period': 'M01',
'periodName': 'January',
'latest': 'true',
'value': '3.7',
'footnotes': [{}]
}
]
)
footnotes is getting parsed as an empty / undefined struct and is causing an error when we attempt to retrieve the value of the field.
File ~/github.com/ibis-project/ibis/ibis/expr/types/core.py:424, in Expr.to_pyarrow(self, params, limit, **kwargs)
396 @experimental
397 def to_pyarrow(
398 self,
(...)
402 **kwargs: Any,
403 ) -> pa.Table:
404 """Execute expression and return results in as a pyarrow table.
405
406 This method is eager and will execute the associated expression
(...)
422 A pyarrow table holding the results of the executed expression.
423 """
--> 424 return self._find_backend(use_default=True).to_pyarrow(
425 self, params=params, limit=limit, **kwargs
426 )
File ~/github.com/ibis-project/ibis/ibis/backends/duckdb/__init__.py:1243, in Backend.to_pyarrow(self, expr, params, limit, **_)
1240 sql = self.compile(table, limit=limit, params=params)
1242 with self._safe_raw_sql(sql) as cur:
-> 1243 table = cur.fetch_arrow_table()
1245 return expr.__pyarrow_result__(table)
InternalException: INTERNAL Error: Attempted to access index 0 within vector of size 0
Could we try to adjust the schema in this case to something that won't barf when it's empty?
What version of ibis are you using?
main
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Just a minor correction, the raw data has the curly braces enclosed in square brackets:
ibis.memtable(
[
{
'year': '2024',
'period': 'M01',
'periodName': 'January',
'latest': 'true',
'value': '3.7',
'footnotes': [{}]
}
]
)
As far how to handle this scenario? For now, I've just dropped the footnotes column. But what should ibis do? Seems tricky. Maybe one logic would be if it detects empty struct, then convert as string?? I am ignorant about the underlying PyArrow handling, so I am just throwing fluff.