feat: deprecate `ibis.memtable` in favor of expanding `ibis.table`
Ibis currently has an ibis.memtable function that takes in-memory data and returns an ibis.Table object representing that data. This is a useful feature, with (IMO) a non-ideal name.
memtable is a noun-like name, which makes it seem like "a memtable" is a distinct concept. In practice though, the output of memtable is just an ibis.Table, same as the output of ibis.table/ibis.read_parquet/ibis.read_csv. It's really just the input to ibis.memtable that is interesting.
A more verb-like name would (IMO) make it clearer that "memtable" isn't a distinct concept, but rather a different way of creating ibis.Table objects. In #8622 I proposed splitting memtable into new from_* methods that are more verb-like.
A different way of handling this would be to merge ibis.memtable and ibis.table (deprecating ibis.memtable). The end goal would be to have ibis.table have the following signature:
def table(
data: Any=None,
*,
schema: SchemaLike | None = None,
name: str | None = None,
catalog: str | None = None,
database: str | None = None,
) -> Table:
...
This would let users do the following:
t = ibis.table({"x": [1, 2, 3], "y": [4, 5, 6]}) # create a table from in-memory data, similar to pandas
t = ibis.table(pandas_df) # create a table from an existing pandas dataframe
t = ibis.table(pyarrow_table, name="foo") # create a table from an existing pyarrow table, with an explicit name
t = ibis.table(schema={"x": "int", "y": "float"}, name="bar") # create an unbound table named `bar`
However, since schema right now may be passed in as a positional argument (to create an unbound table), we'd need to deprecate passing schema in as a positional in one release, then add in data as an optional positional in a follow-up release.
If people like this proposal, I'd suggest:
- We deprecate passing in
schemaas a positional arg toibis.tablein 10.0 - We add in
dataas an arg toibis.tableand deprecateibis.memtablein 11.0
I am a huge fan of this.
I'm curious if we could make this compatible with https://github.com/ibis-project/ibis/issues/9324, eg for the data I would want to be able to pass in an arbitrary mix of ibis expressions and non-ibis data, eg
- an iterable of named
ir.Values - a
Mapping[str, ScalarLike | ColumnLike]
ScalarLike is either an ir.Scalar or a python scalar.
ColumnLike is trickier: either all passed ColumnLikes need to be non-ibis data, so they are alignable by position, or they all need to be ir.Columns that are all coming from the same ir.Relation. If it gets a mix of ibis and non-ibis, we don't know how to align, so error.
In https://github.com/ibis-project/ibis/issues/9324 there is the suggestion of making that API also accept 0 or 1 ir.Tables. IDK exactly how to make this API work for that use case. My thought for this situation is that they should use methods on the table, eg table.mutate(<new columns>), or we could make Tables implement Mapping, and then you could do something like ibis.table({"new_column": "abc", **table}).
To be clear, I don't think my proposal needs to be part of this exact PR, I just want to design the semantics here so we can add that functionality in a followup PR in a way that is compatible.