superset icon indicating copy to clipboard operation
superset copied to clipboard

fix: Ensure Presto database engine spec correctly handles Trino

Open john-bodley opened this issue 2 years ago • 6 comments

SUMMARY

This PR remedies a few issues with Trino—which extends the Presto database engine spec. Specifically it adds a few more safeguards when fetching extra metadata as part of the SQL Lab schema/table workflow.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

john-bodley avatar Jul 16 '22 03:07 john-bodley

Codecov Report

Merging #20729 (d5c5da0) into master (f89ba0c) will decrease coverage by 11.48%. The diff coverage is 45.83%.

:exclamation: Current head d5c5da0 differs from pull request most recent head 0ea8123. Consider uploading reports for the commit 0ea8123 to get more accurate results

@@             Coverage Diff             @@
##           master   #20729       +/-   ##
===========================================
- Coverage   66.33%   54.85%   -11.49%     
===========================================
  Files        1767     1767               
  Lines       67295    67305       +10     
  Branches     7144     7144               
===========================================
- Hits        44643    36919     -7724     
- Misses      20824    28558     +7734     
  Partials     1828     1828               
Flag Coverage Δ
hive 53.14% <45.83%> (+<0.01%) :arrow_up:
mysql ?
postgres ?
presto 53.04% <37.50%> (+<0.01%) :arrow_up:
python 57.77% <45.83%> (-23.70%) :arrow_down:
sqlite ?
unit 50.47% <37.50%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/presto.py 40.87% <38.88%> (-47.11%) :arrow_down:
superset/db_engine_specs/trino.py 39.39% <66.66%> (-34.02%) :arrow_down:
superset/utils/dashboard_import_export.py 0.00% <0.00%> (-100.00%) :arrow_down:
superset/key_value/commands/update.py 0.00% <0.00%> (-88.89%) :arrow_down:
superset/key_value/commands/delete.py 0.00% <0.00%> (-85.30%) :arrow_down:
superset/key_value/commands/delete_expired.py 0.00% <0.00%> (-80.77%) :arrow_down:
superset/dashboards/commands/importers/v0.py 15.62% <0.00%> (-76.25%) :arrow_down:
superset/datasets/commands/update.py 25.00% <0.00%> (-69.05%) :arrow_down:
superset/datasets/commands/create.py 29.41% <0.00%> (-68.63%) :arrow_down:
superset/datasets/commands/importers/v0.py 24.03% <0.00%> (-67.45%) :arrow_down:
... and 279 more

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov[bot] avatar Jul 16 '22 03:07 codecov[bot]

@john-bodley Trino using official driver, not PyHive as Presto. This mean that a lot of functions like get_table_names, get_view_names, handle_cursor, etc... are very different between 2 drivers. Further more, Trino and Presto are now 2 different projects, and going to different direction. So their features will be more and more different.

Tightly couple though 2 engine specs will make it hard to extends and introduce more bugs. Do u think we should revert #20152 to avoid inherit messy code from PrestoEngineSpec.

dungdm93 avatar Jul 22 '22 16:07 dungdm93

@dungdm93 I think in the future when Presto and Trino further diverge it makes sense, however at the moment there is a significant amount of the code that is the same and I sense adhering to the DRY principle (where possible makes sense).

john-bodley avatar Aug 05 '22 00:08 john-bodley

@john-bodley 2 projects are now different enough to have separated EngineSpec (at least in driver side). You just save a few line of code by let TrinoEngineSpec inherit messy code from PrestoEngineSpec (and introducing bugs).

at the moment there is a significant amount of the code that is the same

The significant amount of the code you said is used to patch PyHive driver. Trino in the other hand strictly follow DB-API 2, so default implementations from BaseEngineSpec are good enough. Other utility functions like estimate_statement_cost, query_cost_formatter, etc... may the same. However, those func can be reused by move to an utils.py if u wanna do DRY.

dungdm93 avatar Aug 05 '22 08:08 dungdm93

@ktmud thanks for the review. I've actually had to update the logic since your approval per https://github.com/apache/superset/pull/20729#discussion_r939048444.

john-bodley avatar Aug 05 '22 18:08 john-bodley

@john-bodley an other way to reusing code is the mixing pattern.

dungdm93 avatar Aug 06 '22 02:08 dungdm93

@john-bodley I have to agree with @dungdm93 here - Over the years the Presto spec has caused me considerable grief due to the code being IMO messy and difficult to maintain. For that reason I've been very happy to see the work that @dungdm93 has done on the Trino spec, which IMO is of a great quality and easy follow and extend. So rather than double up on using the old Presto spec code, I'd rather see us gradually phasing it out.

WRT to compatibility of the drivers, I think they're fairly far away from each other right now. For instance, currently clicking on a table in SQL Lab with the recommended trino driver gives off a metadata fetching error (causes a 500 in the backend): image

This is due to the latest_partition method in the Presto spec being incompatible with the trino driver. There are other similar issues; I'm opening a PR soon to fix query cancellation, which also works slightly differently, but there appear to be other differences, too.

If we really want to share code between the Presto and Trino specs I would recommend doing something similar to what we're doing for Postgres and Postgres-like databases, where we have the uncontroversial shared pieces in an abstract PostgresBaseEngineSpec which the actual implementations, like PostgresEngineSpec, HanaEngineSpec, RedshiftEngineSpec etc use. In this case, we could have a PrestoBaseEngineSpec into which we move the high quality shared code, and then leave the messy bits in the old PrestoEngineSpec, while gradually refactoring them into the shared PrestoBaseEngineSpec, all while adding proper and relevant tests for them.

villebro avatar Aug 09 '22 12:08 villebro

@villebro perhaps latest_partition issue is fixed in trinodb/trino-python-client#219. Waiting for next release.

dungdm93 avatar Aug 15 '22 14:08 dungdm93

Closing in favor of https://github.com/apache/superset/pull/21066.

john-bodley avatar Sep 02 '22 03:09 john-bodley