substrait icon indicating copy to clipboard operation
substrait copied to clipboard

Improve testing with Ibis

Open Tishj opened this issue 1 year ago • 6 comments

This PR adds classes that reduces the amount of boilerplate needed to test our results against/with Ibis

Tester Interface

IbisDuckDBTester is an interface that defines the skeleton of the tester. To derive from an IbisDuckDBTester, the only thing that needs to be created is one method:

        def generate_relation(self, expr) -> duckdb.DuckDBPyRelation:

This method is used to transform the Ibis expression to an equivalent DuckDBPyRelation.

API

To set up the tester, you provide a list of sql queries, they are run sequentially at start up.

To interact with the tester, a single test method is exposed:

	def test(self, expression_producer, *args):

It's arguments are:

  • expression_producer A function that produces an Ibis expression, given an ibis connection, for example:
def extract_component(ibis_db, named_component):
	tbl = ibis_db.table('tbl')
	expr = tbl[getattr(tbl.d.time(), named_component)().cast('int64')]
	return expr
  • *args Any arguments required by the expression_producer, these will be forwarded by the tester

Implementations

  • SQLIbisDuckDBTester This tester uses ibis.to_sql(expr) to convert the expression to sql, and then uses duckdb.sql to create a relation out of it.

  • SubstraitIbisDuckDBTester This tester uses SubstraitCompiler to generate the substrait from an expression, which we then convert to JSON, and finally we use duckdb.from_substrait_json to create a relation from the substrait plan.

Coverage

For ease of use and to maximize coverage, this PR also adds a CombinedIbisDuckDBTester which bundles the existing implementations and test forwards the calls to the internal testers.

Technical details

The internal implementation (summarized) is as follows: We create a persistent duckdb database populate it using the initialization queries. crearte a connection to the duckdb database with ibis

Every time test is invoked: create an expression on it (using the expression_producer) create a duckdb relation from the ibis expression (using the virtual generate_relation method) execute both the relation and the expression, outputting to pyarrow - asserting that they produce the same result

Tishj avatar Mar 02 '23 09:03 Tishj