substrait
substrait copied to clipboard
Improve testing with Ibis
This PR adds classes that reduces the amount of boilerplate needed to test our results against/with Ibis
Tester Interface
IbisDuckDBTester
is an interface that defines the skeleton of the tester.
To derive from an IbisDuckDBTester, the only thing that needs to be created is one method:
def generate_relation(self, expr) -> duckdb.DuckDBPyRelation:
This method is used to transform the Ibis expression to an equivalent DuckDBPyRelation.
API
To set up the tester, you provide a list of sql queries, they are run sequentially at start up.
To interact with the tester, a single test
method is exposed:
def test(self, expression_producer, *args):
It's arguments are:
-
expression_producer
A function that produces an Ibis expression, given an ibis connection, for example:
def extract_component(ibis_db, named_component):
tbl = ibis_db.table('tbl')
expr = tbl[getattr(tbl.d.time(), named_component)().cast('int64')]
return expr
-
*args
Any arguments required by theexpression_producer
, these will be forwarded by the tester
Implementations
-
SQLIbisDuckDBTester
This tester usesibis.to_sql(expr)
to convert the expression to sql, and then usesduckdb.sql
to create a relation out of it. -
SubstraitIbisDuckDBTester
This tester usesSubstraitCompiler
to generate the substrait from an expression, which we then convert to JSON, and finally we useduckdb.from_substrait_json
to create a relation from the substrait plan.
Coverage
For ease of use and to maximize coverage, this PR also adds a CombinedIbisDuckDBTester
which bundles the existing implementations and test
forwards the calls to the internal testers.
Technical details
The internal implementation (summarized) is as follows: We create a persistent duckdb database populate it using the initialization queries. crearte a connection to the duckdb database with ibis
Every time test
is invoked:
create an expression on it (using the expression_producer
)
create a duckdb relation from the ibis expression (using the virtual generate_relation
method)
execute both the relation and the expression, outputting to pyarrow - asserting that they produce the same result