coral icon indicating copy to clipboard operation
coral copied to clipboard

Add support for Integration test for Coral

Open sumedhsakdeo opened this issue 2 months ago • 4 comments

What changes are proposed in this pull request, and why are they necessary?

This PR introduces a comprehensive integration testing infra for the Coral project that validates SQL translation capabilities across multiple query engines (Spark and Trino) with support for both Iceberg tables, and Hive tables / views.

It also adds sample test that can serve as a starter reference to add more tests to the repo and overall making the coral's interoperability story even more robust..

How was this patch tested?

The patch only adds tests to the coral repo.

sumedhsakdeo avatar Oct 22 '25 22:10 sumedhsakdeo

Thanks Sumedh for the PR! Both coral-trino and coral-spark contain integration tests of this nature and also structure how to provide the input and assert output in a consistent way. Could you reuse them to add the specific tests or cases you want to add?

wmoustafa avatar Oct 27 '25 06:10 wmoustafa

Thanks Sumedh for the PR! Both coral-trino and coral-spark contain integration tests of this nature and also structure how to provide the input and assert output in a consistent way. Could you reuse them to add the specific tests or cases you want to add?

Thanks Walaa. I did look at the tests in coral-trino and coral-spark, however, I was trying to create a test module where interoperability can be tested thoroughly. Hence the need for separate module. Also, the way things are setup, we need more flexibility in how we instantiate HMS for Hive (built-in is ok) v/s Iceberg (built-in is not ok).

I think there is value in this kind of "uber" integration tests for interop testing, hence the addition.

sumedhsakdeo avatar Oct 27 '25 22:10 sumedhsakdeo

Can we unify things somehow? It is not clear to me when to use either side. Also there is a bunch of repeated code.

wmoustafa avatar Oct 28 '25 00:10 wmoustafa

Can we unify things somehow? It is not clear to me when to use either side. Also there is a bunch of repeated code.

Ack, I understand the concern about repeated code related to infra setup (Spark, HMS, Iceberg, etc.) and checking why standalone module for interoperability testing is key for a project like Coral. Here are some thoughts.

We could position the tests as follows: coral-trino, coral-spark: Unit/module-level tests validating translation logic in isolation coral-integration-test: End-to-end tests validating that translated queries actually execute correctly across engines and table formats. This provides a place to test scenarios that span multiple engines and formats, instead of duplicating them in each of the modules.

It already has started to be effective for some of the test scenarios, https://github.com/sumedhsakdeo/coral/pull/1/files Main thing I would like to avoid with this module is developers preparing SNAPSHOT jars and then running in spark-shell / coral-tools manually, while be able to put a debugger in the IntelliJ and learn what is going on.

My worry is if I do bring setup module into coral-trino and coral-spark, I will likely be replicating a lot more of boiler plate code for Iceberg, HMS setup in both the modules. Also, I would rather lean on Mocking in individual modules, and full blown integration tests in this newly introduced module.

Wdyt?

sumedhsakdeo avatar Oct 28 '25 04:10 sumedhsakdeo