ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug: error when handling empty / undefined struct

Open gforsyth opened this issue 1 year ago • 1 comments
trafficstars

What happened?

We have a user on Zulip who was trying to make a memtable out of something like the following (representative of an upstream data source):

ibis.memtable(
[
    {
        'year': '2024',
        'period': 'M01',
        'periodName': 'January',
        'latest': 'true',
        'value': '3.7',
        'footnotes': [{}]
    }
]
)

footnotes is getting parsed as an empty / undefined struct and is causing an error when we attempt to retrieve the value of the field.

File ~/github.com/ibis-project/ibis/ibis/expr/types/core.py:424, in Expr.to_pyarrow(self, params, limit, **kwargs)
    396 @experimental
    397 def to_pyarrow(
    398     self,
   (...)
    402     **kwargs: Any,
    403 ) -> pa.Table:
    404     """Execute expression and return results in as a pyarrow table.
    405 
    406     This method is eager and will execute the associated expression
   (...)
    422         A pyarrow table holding the results of the executed expression.
    423     """
--> 424     return self._find_backend(use_default=True).to_pyarrow(
    425         self, params=params, limit=limit, **kwargs
    426     )

File ~/github.com/ibis-project/ibis/ibis/backends/duckdb/__init__.py:1243, in Backend.to_pyarrow(self, expr, params, limit, **_)
   1240 sql = self.compile(table, limit=limit, params=params)
   1242 with self._safe_raw_sql(sql) as cur:
-> 1243     table = cur.fetch_arrow_table()
   1245 return expr.__pyarrow_result__(table)

InternalException: INTERNAL Error: Attempted to access index 0 within vector of size 0

Could we try to adjust the schema in this case to something that won't barf when it's empty?

What version of ibis are you using?

main

What backend(s) are you using, if any?

DuckDB

Relevant log output

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

gforsyth avatar Feb 26 '24 19:02 gforsyth

Just a minor correction, the raw data has the curly braces enclosed in square brackets:

ibis.memtable(
[
    {
        'year': '2024',
        'period': 'M01',
        'periodName': 'January',
        'latest': 'true',
        'value': '3.7',
        'footnotes': [{}]
    }
]
)

As far how to handle this scenario? For now, I've just dropped the footnotes column. But what should ibis do? Seems tricky. Maybe one logic would be if it detects empty struct, then convert as string?? I am ignorant about the underlying PyArrow handling, so I am just throwing fluff.

pybokeh avatar Feb 28 '24 17:02 pybokeh