ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: deprecate `ibis.memtable` in favor of expanding `ibis.table`

Open jcrist opened this issue 1 year ago • 2 comments

Ibis currently has an ibis.memtable function that takes in-memory data and returns an ibis.Table object representing that data. This is a useful feature, with (IMO) a non-ideal name.

memtable is a noun-like name, which makes it seem like "a memtable" is a distinct concept. In practice though, the output of memtable is just an ibis.Table, same as the output of ibis.table/ibis.read_parquet/ibis.read_csv. It's really just the input to ibis.memtable that is interesting.

A more verb-like name would (IMO) make it clearer that "memtable" isn't a distinct concept, but rather a different way of creating ibis.Table objects. In #8622 I proposed splitting memtable into new from_* methods that are more verb-like.

A different way of handling this would be to merge ibis.memtable and ibis.table (deprecating ibis.memtable). The end goal would be to have ibis.table have the following signature:

def table(
    data: Any=None,
    *,
    schema: SchemaLike | None = None,
    name: str | None = None,
    catalog: str | None = None,
    database: str | None = None,
) -> Table: 
    ...

This would let users do the following:

t  = ibis.table({"x": [1, 2, 3], "y": [4, 5, 6]})  # create a table from in-memory data, similar to pandas
t = ibis.table(pandas_df)  # create a table from an existing pandas dataframe
t = ibis.table(pyarrow_table, name="foo")  # create a table from an existing pyarrow table, with an explicit name
t = ibis.table(schema={"x": "int", "y": "float"}, name="bar")  # create an unbound table named `bar`

However, since schema right now may be passed in as a positional argument (to create an unbound table), we'd need to deprecate passing schema in as a positional in one release, then add in data as an optional positional in a follow-up release.

If people like this proposal, I'd suggest:

  • We deprecate passing in schema as a positional arg to ibis.table in 10.0
  • We add in data as an arg to ibis.table and deprecate ibis.memtable in 11.0

jcrist avatar Jun 04 '24 17:06 jcrist

I am a huge fan of this.

I'm curious if we could make this compatible with https://github.com/ibis-project/ibis/issues/9324, eg for the data I would want to be able to pass in an arbitrary mix of ibis expressions and non-ibis data, eg

  • an iterable of named ir.Values
  • a Mapping[str, ScalarLike | ColumnLike]

ScalarLike is either an ir.Scalar or a python scalar.

ColumnLike is trickier: either all passed ColumnLikes need to be non-ibis data, so they are alignable by position, or they all need to be ir.Columns that are all coming from the same ir.Relation. If it gets a mix of ibis and non-ibis, we don't know how to align, so error.

In https://github.com/ibis-project/ibis/issues/9324 there is the suggestion of making that API also accept 0 or 1 ir.Tables. IDK exactly how to make this API work for that use case. My thought for this situation is that they should use methods on the table, eg table.mutate(<new columns>), or we could make Tables implement Mapping, and then you could do something like ibis.table({"new_column": "abc", **table}).

NickCrews avatar Jun 07 '24 19:06 NickCrews

To be clear, I don't think my proposal needs to be part of this exact PR, I just want to design the semantics here so we can add that functionality in a followup PR in a way that is compatible.

NickCrews avatar Jun 10 '24 19:06 NickCrews