superset
superset copied to clipboard
fix: Ensure Presto database engine spec correctly handles Trino
SUMMARY
This PR remedies a few issues with Trino—which extends the Presto database engine spec. Specifically it adds a few more safeguards when fetching extra metadata as part of the SQL Lab schema/table workflow.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Codecov Report
Merging #20729 (d5c5da0) into master (f89ba0c) will decrease coverage by
11.48%
. The diff coverage is45.83%
.
:exclamation: Current head d5c5da0 differs from pull request most recent head 0ea8123. Consider uploading reports for the commit 0ea8123 to get more accurate results
@@ Coverage Diff @@
## master #20729 +/- ##
===========================================
- Coverage 66.33% 54.85% -11.49%
===========================================
Files 1767 1767
Lines 67295 67305 +10
Branches 7144 7144
===========================================
- Hits 44643 36919 -7724
- Misses 20824 28558 +7734
Partials 1828 1828
Flag | Coverage Δ | |
---|---|---|
hive | 53.14% <45.83%> (+<0.01%) |
:arrow_up: |
mysql | ? |
|
postgres | ? |
|
presto | 53.04% <37.50%> (+<0.01%) |
:arrow_up: |
python | 57.77% <45.83%> (-23.70%) |
:arrow_down: |
sqlite | ? |
|
unit | 50.47% <37.50%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
superset/db_engine_specs/presto.py | 40.87% <38.88%> (-47.11%) |
:arrow_down: |
superset/db_engine_specs/trino.py | 39.39% <66.66%> (-34.02%) |
:arrow_down: |
superset/utils/dashboard_import_export.py | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
superset/key_value/commands/update.py | 0.00% <0.00%> (-88.89%) |
:arrow_down: |
superset/key_value/commands/delete.py | 0.00% <0.00%> (-85.30%) |
:arrow_down: |
superset/key_value/commands/delete_expired.py | 0.00% <0.00%> (-80.77%) |
:arrow_down: |
superset/dashboards/commands/importers/v0.py | 15.62% <0.00%> (-76.25%) |
:arrow_down: |
superset/datasets/commands/update.py | 25.00% <0.00%> (-69.05%) |
:arrow_down: |
superset/datasets/commands/create.py | 29.41% <0.00%> (-68.63%) |
:arrow_down: |
superset/datasets/commands/importers/v0.py | 24.03% <0.00%> (-67.45%) |
:arrow_down: |
... and 279 more |
:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more
@john-bodley Trino using official driver, not PyHive as Presto.
This mean that a lot of functions like get_table_names
, get_view_names
, handle_cursor
, etc... are very different between 2 drivers. Further more, Trino and Presto are now 2 different projects, and going to different direction. So their features will be more and more different.
Tightly couple though 2 engine specs will make it hard to extends and introduce more bugs. Do u think we should revert #20152 to avoid inherit messy code from PrestoEngineSpec
.
@dungdm93 I think in the future when Presto and Trino further diverge it makes sense, however at the moment there is a significant amount of the code that is the same and I sense adhering to the DRY principle (where possible makes sense).
@john-bodley 2 projects are now different enough to have separated EngineSpec (at least in driver side).
You just save a few line of code by let TrinoEngineSpec
inherit messy code from PrestoEngineSpec
(and introducing bugs).
at the moment there is a significant amount of the code that is the same
The significant amount of the code you said is used to patch PyHive driver. Trino in the other hand strictly follow DB-API 2, so default implementations from BaseEngineSpec
are good enough.
Other utility functions like estimate_statement_cost
, query_cost_formatter
, etc... may the same. However, those func can be reused by move to an utils.py
if u wanna do DRY.
@ktmud thanks for the review. I've actually had to update the logic since your approval per https://github.com/apache/superset/pull/20729#discussion_r939048444.
@john-bodley an other way to reusing code is the mixing pattern.
@john-bodley I have to agree with @dungdm93 here - Over the years the Presto spec has caused me considerable grief due to the code being IMO messy and difficult to maintain. For that reason I've been very happy to see the work that @dungdm93 has done on the Trino spec, which IMO is of a great quality and easy follow and extend. So rather than double up on using the old Presto spec code, I'd rather see us gradually phasing it out.
WRT to compatibility of the drivers, I think they're fairly far away from each other right now. For instance, currently clicking on a table in SQL Lab with the recommended trino
driver gives off a metadata fetching error (causes a 500 in the backend):
This is due to the latest_partition
method in the Presto spec being incompatible with the trino
driver. There are other similar issues; I'm opening a PR soon to fix query cancellation, which also works slightly differently, but there appear to be other differences, too.
If we really want to share code between the Presto and Trino specs I would recommend doing something similar to what we're doing for Postgres and Postgres-like databases, where we have the uncontroversial shared pieces in an abstract PostgresBaseEngineSpec
which the actual implementations, like PostgresEngineSpec
, HanaEngineSpec
, RedshiftEngineSpec
etc use. In this case, we could have a PrestoBaseEngineSpec
into which we move the high quality shared code, and then leave the messy bits in the old PrestoEngineSpec
, while gradually refactoring them into the shared PrestoBaseEngineSpec
, all while adding proper and relevant tests for them.
@villebro perhaps latest_partition
issue is fixed in trinodb/trino-python-client#219. Waiting for next release.
Closing in favor of https://github.com/apache/superset/pull/21066.