ibis
ibis copied to clipboard
feat: expose `highest_precedence(*dtypes)`
Is your feature request related to a problem?
I want to be able to unify the schemas of multiple tables.
Currently I have something like
from ibis.expr.datatypes import highest_precedence
def unify_schemas(
schemas: Iterable[ibis.Schema | Mapping[str, Any]],
*,
how: Literal["error", "union", "intersection"] = "error",
on_conflict: Literal["upcast", "error"] = "upcast",
) -> ibis.Schema:
"""Unify multiple schemas into one.
Parameters
----------
schemas
The schemas to unify.
how
How to handle columns that are present in some schemas but not others.
- "error": raise a ValueError
- "union": keep all columns
- "intersection": only keep columns that are in all schemas
on_conflict
What to do when schemas have a column with the same name, but different types.
Options are:
- "upcast": upcast the column to the most general type
- "error": raise a ValueError
"""
schemas = [ibis.schema(schema) for schema in schemas]
column_sets = [set(schema) for schema in schemas]
union = set().union(*column_sets)
if how == "error":
for schema in schemas:
missing = union - set(schema)
if missing:
raise ValueError(
f"missing columns {missing} from schema {schema}", missing
)
out_columns = union
elif how == "union":
out_columns = union
elif how == "intersection":
out_columns = union.intersection(*column_sets)
else:
raise ValueError(f"unknown how: {how}")
out_schema = {}
errors = []
for col in out_columns:
types = {schema[col] for schema in schemas if col in schema}
if on_conflict == "error":
if len(types) > 1:
errors.append((col, types))
else:
typ = next(iter(types))
elif on_conflict == "upcast":
typ = highest_precedence(types)
else:
raise ValueError(f"unknown on_conflict: {on_conflict}")
out_schema[col] = typ
if errors:
raise ValueError(f"conflicting types: {errors}")
return ibis.schema(out_schema)
Note that I have to do the import of highest_precedence()
What is the motivation behind your request?
No response
Describe the solution you'd like
Maybe DataType.highest_precendence(*others: DataType)? A top-level API like ibis.highest_dtype() also would be reasonable, but this seems like rare enough of a need that I don't really want to pollute the top-level namespace with it.
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for the issue!
Can clarify what you're asking for here? Is it just to "officialize" the API?
yup, just include it in the docs so that we know it is a stable(ish) API. No functional changes needed.