ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: unbind expression

Open gforsyth opened this issue 3 years ago • 2 comments
trafficstars

I've run into the situation a few times where I have an existing Ibis expression that I pass to some function and it would work except that I am not using unbound tables and so that operation fails.

This just happened to me with @kszucs 's decompiler PR, but also happens when using Ibis-substrait.

I also think that, outside of those possibly uncommon use-cases, often we have a schema already defined in a backend somewhere and we want the equivalent unbound table -- this isn't terribly hard to do, but it's annoying if you don't notice until you are several steps deep.

So, how about an unbind() for Expressions that returns the same expression but with UnboundTable in place of DatabaseTable

Then con.execute(expr.unbind()) == expr.execute()

Some other thoughts / ideas / edge-cases: What to do with memtable? Should it return the unbound expression AND the data underlying the memtable?

Should all of the backends return either data or sufficient metadata to reload the table in question? (That should probably be behind a feature flag, I think, or the execute(expr.unbind()) thing falls down.

cc @saulpw @cpcloud

gforsyth avatar Sep 19 '22 19:09 gforsyth

For those backends that have parse_type defined (a bunch of them...?) this should be walking the expression graph and subbing in unbound tables a la:

def unbound_from_bound(table):
    return ibis.table(
        list(zip(table.columns, map(parse_type, table.dtypes))), name=table.alias
    )

gforsyth avatar Sep 19 '22 19:09 gforsyth

I like the idea, also it's kinda a prerequisite to properly serialize a bound expression.

Lowering a DatabaseTable should be straightforward since we know its name and its ibis schema already. Regarding memtables I don't think we can do anything else than leaving them as memtables - they are bound to no backends.

Something like the following could work (haven't tried it):

import ibis.expr.analysis as an

def unbind(op):
    if isinstance(op, ops.DatabaseTable):
        return ops.UnboundTable(name=op.name, schema=op.schema)
    else:
        return op

unbound_node = an.substitute(unbind, bound_node)

kszucs avatar Sep 19 '22 19:09 kszucs