ibis
ibis copied to clipboard
feat: expose `Table._ensure_expr()`
Is your feature request related to a problem?
I am writing a framework for record linkage on top of Ibis. As part of that, I have an API that takes as arguments
- A Table
- something that references a column within that table
So the basic examples for point 2 are
- a string eg "my_col"
- a deferred eg
_.my_col.upper()[:3] - a lambda eg
lambda table: table.my_col.upper().cast(int)
It would be nice if there was a universal API on Tables that allowed me convert all of these to a Column. I can't use __getitem__, because that can return a Table:
import ibis
t = ibis.memtable({"island": [1, 2, 3, 4, 5]})
print(type(t["island"])) # Column
print(type(t[_.island])) # Table
print(type(t[lambda t: t.island])) # Table
Currently I am using Table._ensure_expr, but that feels icky since it is private.
Describe the solution you'd like
maybe Table.column(Any) -> Column?
We should think about if the new method would shadow the name of a column in the Table, but I hope that people aren't nameing their columns "column"...
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for the feature request!
Like many other things, the implementation of the binding process is changing quite a bit in the-epic-split.
Table._ensure_expr is removed in that branch, and replaced with a function called bind that universally handles inputs, and converts them into an iterable of expressions.
bind is more generic than handling just a single column though. It not only handles strings, deferreds and lambdas, but also selectors, mappings and iterables of all those things.
That bind API looks like this:
exprs = bind(table, "island")
exprs = bind(table, _.island)
exprs = bind(table, lambda t: t.island)
exprs = bind(table, s.matches("island"))
exprs = bind(table, table.island)
exprs = bind(table, ["island"])
exprs = bind(table, {"eye-land": "island"})
@kszucs Thoughts on making this API public after we merge the-epic-split?
I think we can expose bind, though not sure whether this should be exposed as a function or a method. Preference?
That looks perfect. Can we make it so it is a list of Columns, not a mere Iterable of Columns? that will be more usable for people, and there shouldn't be a performance downside.
I vote method, so there is symmetry with
- .select() returns a Table
- .bind() returns a list of Columns
__getitem__returns either, depending on inputs.
We've got bind for this now, since 9.0.0.