ibis
ibis copied to clipboard
fix: disallow empty struct
Description of changes
Disallow empty struct in type inference. Pyarrow allows the construction of empty structs but duckdb doesn't.
I think what the user in #8460 was attempting to create was an empty value in a struct column that has defined fields. You can accomplish this with the following:
>>> import ibis
>>>
>>> ibis.options.interactive = True
>>>
>>> t = ibis.memtable(
... [
... {
... "year": "2024",
... "period": "M01",
... "periodName": "January",
... "latest": "true",
... "value": "3.7",
... "footnotes": None,
... }
... ],
... schema={"year": "string", "period": "string", "periodName": "string", "latest": "string", "value": "string", "footnotes": "struct<key: string>"}
... )
>>> t
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━┓
┃ year ┃ period ┃ periodName ┃ latest ┃ value ┃ footnotes ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━┩
│ string │ string │ string │ string │ string │ struct<k… │
├────────┼────────┼────────────┼────────┼────────┼───────────┤
│ 2024 │ M01 │ January │ true │ 3.7 │ NULL │
└────────┴────────┴────────────┴────────┴────────┴───────────┘
I do want to note here that ibis interprets {} as a struct whose keys point to empty values rather than a struct that is empty:
>>> t = ibis.memtable(
... [
... {
... "year": "2024",
... "period": "M01",
... "periodName": "January",
... "latest": "true",
... "value": "3.7",
... "footnotes": {},
... }
... ],
... schema={"year": "string", "period": "string", "periodName": "string", "latest": "string", "value": "string", "footnotes": "struct<key: string>"}
... )
>>> t
┏━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ year ┃ period ┃ periodName ┃ latest ┃ value ┃ footnotes ┃
┡━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ string │ string │ string │ string │ string │ struct<key: string> │
├────────┼────────┼────────────┼────────┼────────┼──────────────────────┤
│ 2024 │ M01 │ January │ true │ 3.7 │ {'key': None} │
└────────┴────────┴────────────┴────────┴────────┴──────────────────────┘
Issues closed
#8460
It seems to me like {} does not properly follow the schema of "struct<key: string>", and so ibis.literal({}, type="struct<key: string>") should just fail. The user's input is malformed, and they will need to deal with it external to ibis. Or, perhaps they could use a map<string, string> type instead, which properly represents a datatype with a variable number of keys.
IDK, maybe we could make ibis lenient, but it feels like a slippery slope. If we allow this, then it feels like we should make it so that ibis.struct({"a": "foo"}).cast("struct<a: string, b:int>") to succeed, and result in {"a": "foo", "b": None} (ie if you cast something to a struct, anytime a field isn't present, fill it in with NULL). This feels OK to me, but I don't think backends have good builtin support for this, we would have to add workarounds all over the place.