ibis
ibis copied to clipboard
fix: manually casting the type of pos to Int16 for table.info()
Description of changes
when I was working on a data science project, it has more than 128 columns in the table, when I run table.info()
it throws the error
RelationError Traceback (most recent call last)
Cell In[95], line 1
----> 1 static[cols].info()
File /opt/conda/lib/python3.10/site-packages/ibis/expr/types/relations.py:2829, in Table.info(self)
2817 agg = self.select(
2818 isna=ibis.case().when(col.isnull(), 1).else_(0).end()
2819 ).agg(
(...)
2826 pos=lit(pos),
2827 )
2828 aggs.append(agg)
-> 2829 return ibis.union(*aggs).order_by(ibis.asc("pos"))
File /opt/conda/lib/python3.10/site-packages/ibis/expr/api.py:1948, in union(table, distinct, *rest)
1882 def union(table: ir.Table, *rest: ir.Table, distinct: bool = False) -> ir.Table:
1883 """Compute the set union of multiple table expressions.
1884
1885 The input tables must have identical schemas.
(...)
1946
1947 """
-> 1948 return table.union(*rest, distinct=distinct) if rest else table
File /opt/conda/lib/python3.10/site-packages/ibis/expr/types/relations.py:1751, in Table.union(self, table, distinct, *rest)
1749 node = ops.Union(self, table, distinct=distinct)
1750 for table in rest:
-> 1751 node = ops.Union(node, table, distinct=distinct)
1752 return node.to_expr().select(self.columns)
File /opt/conda/lib/python3.10/site-packages/ibis/common/bases.py:72, in AbstractMeta.__call__(cls, *args, **kwargs)
52 def __call__(cls, *args, **kwargs):
53 """Create a new instance of the class.
54
55 The subclass may override the `__create__` classmethod to change the
(...)
70
71 """
---> 72 return cls.__create__(*args, **kwargs)
File /opt/conda/lib/python3.10/site-packages/ibis/common/grounds.py:120, in Annotable.__create__(cls, *args, **kwargs)
116 @classmethod
117 def __create__(cls, *args: Any, **kwargs: Any) -> Self:
118 # construct the instance by passing only validated keyword arguments
119 kwargs = cls.__signature__.validate(cls, args, kwargs)
--> 120 return super().__create__(**kwargs)
File /opt/conda/lib/python3.10/site-packages/ibis/expr/operations/relations.py:293, in Set.__init__(self, left, right, **kwargs)
290 def __init__(self, left, right, **kwargs):
291 # convert to dictionary first, to get key-unordered comparison semantics
292 if dict(left.schema) != dict(right.schema):
--> 293 raise RelationError("Table schemas must be equal for set operations")
294 elif left.schema.names != right.schema.names:
295 # rewrite so that both sides have the columns in the same order making it
296 # easier for the backends to implement set operations
297 cols = {name: Field(right, name) for name in left.schema.names}
RelationError: Table schemas must be equal for set operations
Here is the code could be used to reproduce the error,
import ibis
...: ibis.options.interactive = True
...: df = {
...: f"col_{i}": [1, 0] for i in range(129)
...: }
...: print(len(df.keys()))
...: d = ibis.memtable(df)
...: d.info()
The schemas when union
are different, The type of pos
is int8 for the first 128 columns, it will be int16
after that. I cast all into int16
which is large enough for lots of columns.
dict(left.schema) = {
'name': String(nullable=True),
'type': String(nullable=True),
'nullable': Boolean(nullable=True),
'nulls': Int64(nullable=True),
'non_nulls': Int64(nullable=True),
'null_frac': Float64(nullable=True),
'pos': Int8(nullable=True)
}
dict(right.schema) = {
'name': String(nullable=True),
'type': String(nullable=True),
'nullable': Boolean(nullable=True),
'nulls': Int64(nullable=True),
'non_nulls': Int64(nullable=True),
'null_frac': Float64(nullable=True),
###### int16
'pos': Int16(nullable=True)
}
Issues closed
ACTION NEEDED
Ibis follows the Conventional Commits specification for release automation.
The PR title and description are used as the merge commit message.
Please update your PR title and description to match the specification.
This PR needs a test. Something like:
- Create a table with > 127 columns
- Call the
info()
method- Make some kind of assertion
tried to add test like this, the code could work when num_cols is small.
@pytest.mark.notyet(
["druid"],
raises=PyDruidProgrammingError,
reason="Druid only supports trivial unions",
)
def test_table_info_wider(con, backend):
num_cols = 129
col_names = [f"col_{i}"for i in range(num_cols)]
t = ibis.memtable(
{col: [0, 1] for col in col_names}
)
expr = t.info()
result = expr.execute()
assert list(result.name) == col_names
but got the RecursionError:
when I have 129 columns, not sure if I create the table correctly, I did tried con.create_table()
and ibis.memtable()
Do you have an idea on this?
Here is the log
expr = t.info()
> result = expr.execute()
ibis/backends/tests/test_generic.py:633:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
ibis/expr/types/core.py:393: in execute
return self._find_backend(use_default=True).execute(
ibis/backends/duckdb/__init__.py:1355: in execute
table = self._to_duckdb_relation(expr, params=params, limit=limit).arrow()
ibis/backends/duckdb/__init__.py:1288: in _to_duckdb_relation
sql = self.compile(table_expr, limit=limit, params=params)
ibis/backends/sql/__init__.py:178: in compile
sql = query.sql(dialect=self.dialect, pretty=pretty, copy=False)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/expressions.py:561: in sql
return Dialect.get_or_raise(dialect).generate(self, **opts)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/dialects/dialect.py:498: in generate
return self.generator(**opts).generate(expression, copy=copy)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:583: in generate
sql = self.sql(expression).strip()
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:739: in sql
sql = getattr(self, exp_handler_name)(expression)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:2230: in select_sql
self.sql(expression, "from", comment=False),
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:739: in sql
sql = getattr(self, exp_handler_name)(expression)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:2316: in union_sql
return self.set_operations(expression)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:2309: in set_operations
sqls.append(self.sql(node))
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:739: in sql
sql = getattr(self, exp_handler_name)(expression)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:2230: in select_sql
self.sql(expression, "from", comment=False),
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:728: in sql
return self.sql(value)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:739: in sql
sql = getattr(self, exp_handler_name)(expression)
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:1800: in from_sql
return f"{self.seg('FROM')} {self.sql(expression, 'this')}"
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:728: in sql
return self.sql(value)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <sqlglot.dialects.duckdb.DuckDB.Generator object at 0x154de9910>
expression = Subquery(
this=Union(
this=Select(
expressions=[
Star()],
from=From(
this=Subquery(
... alias=Identifier(this=t203, quoted=True)))),
distinct=False),
alias=Identifier(this=t313, quoted=True))
key = None, comment = True
def sql(
self,
expression: t.Optional[str | exp.Expression],
key: t.Optional[str] = None,
comment: bool = True,
) -> str:
if not expression:
return ""
if isinstance(expression, str):
return expression
if key:
value = expression.args.get(key)
if value:
return self.sql(value)
return ""
transform = self.TRANSFORMS.get(expression.__class__)
> if callable(transform):
E RecursionError: maximum recursion depth exceeded while calling a Python object
/Users/claypot/miniconda3/envs/ibis-dev-arm64/lib/python3.11/site-packages/sqlglot/generator.py:733: RecursionError
!!! Recursion error detected, but an error occurred locating the origin of recursion.
The following exception happened when comparing locals in the stack frame:
RecursionError: maximum recursion depth exceeded
Displaying first and last 10 stack frames out of 964.
========================================================================================================== short test summary info ===========================================================================================================
FAILED ibis/backends/tests/test_generic.py::test_table_info_wider[duckdb] - RecursionError: maximum recursion depth exceeded while calling a Python object
Seems like flink can't handle the new test. Can you mark it as notyet
or broken
?
Tests are still failing for Flink.
Could we re run the test? The logs:
Caused by: java.io.IOException: Insufficient number of network buffers: required 512, but only 348 available. The total number of network buffers is currently set to 20316 of 32768 bytes each. You can increase this number by setting the configuration keys 'taskmanager.memory.network.fraction', 'taskmanager.memory.network.min', and 'taskmanager.memory.network.max'.
Not really, because it's a repeatable failure.
It's almost certainly related to the "large" number of columns (not actually that large).
Can you instead xfail the test for now?