superset icon indicating copy to clipboard operation
superset copied to clipboard

fix: Handling of column types for Presto, Trino, et al.

Open john-bodley opened this issue 1 year ago • 3 comments

SUMMARY

This is a rehash of @brouberol's https://github.com/apache/superset/pull/26782 which was later reverted in https://github.com/apache/superset/pull/28613 as it was problematic with Trino, given both the PrestoEngineSpec and TrinoEngineSpec derive from the PrestoBaseEngineSpec.

The TL;DR is typically most SQLAlchemy dialects (including Trino) use native SQLAlchemy types for encoding column types, whereas PyHive is unconventional and uses str. I asked a similar question years ago and you can see here how these types are materialized.

I hypothesize that part of the confusion when @brouberol's was authoring their PR was that the ResultSetColumnType TypedDict was incorrect, alluding to the fact that the type field was Optional[str] as opposed to Optional[Union[str, TypeEngine, type[TypeEngine]]. This PR simply ensures that in the PrestoBaseEngineSpec class—which handles Hive, Presto, and Trino—that we only try to coerce str types to a SQLAlchemy type (if available).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added unit tests.

ADDITIONAL INFORMATION

  • [x] Has associated issue:
    • Fixes https://github.com/apache/superset/issues/25636
    • Fixes https://github.com/apache/superset/issues/25962
    • Fixes https://github.com/apache/superset/issues/26740
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

john-bodley avatar May 23 '24 21:05 john-bodley

Codecov Report

Attention: Patch coverage is 87.50000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 83.50%. Comparing base (76d897e) to head (6a35459). Report is 158 commits behind head on master.

Files Patch % Lines
superset/db_engine_specs/presto.py 80.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #28653       +/-   ##
===========================================
+ Coverage   60.48%   83.50%   +23.01%     
===========================================
  Files        1931      522     -1409     
  Lines       76236    37497    -38739     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    31313    -14801     
+ Misses      28017     6184    -21833     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 49.12% <37.50%> (-0.05%) :arrow_down:
javascript ?
mysql 77.16% <62.50%> (?)
postgres 77.30% <62.50%> (?)
presto 53.67% <37.50%> (-0.13%) :arrow_down:
python 83.50% <87.50%> (+20.02%) :arrow_up:
sqlite 76.75% <62.50%> (?)
unit 58.97% <75.00%> (+1.35%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 23 '24 21:05 codecov[bot]

@gooroodev review

admsev avatar May 25 '24 15:05 admsev

1. Summary of Changes

The pull request addresses the handling of column types for Presto and Trino engines in the Superset project. The key changes include:

  • Presto Engine:
    • Handling of column types in the get_children and where_latest_partition methods.
    • Added logic to convert string column types to SQLAlchemy types dynamically.
  • Trino Engine:
    • Added a similar handling of column types in the where_latest_partition method.
    • Added unit tests for the where_latest_partition method.
  • Typing Changes:
    • Introduced a new type alias SQLType in superset_typing.py to represent SQLAlchemy column types.
    • Updated ResultSetColumnType to use SQLType.

2. Issues, Bugs, or Typos

Issue 1:

  • File: superset/db_engine_specs/presto.py
  • Code:
    match = pattern.match(cast(str, column["type"]))
    
    Problem: The use of cast(str, column["type"]) is unnecessary if column["type"] is already a string. Improvement:
    match = pattern.match(column["type"])
    

Issue 2:

  • File: superset/db_engine_specs/presto.py
  • Code:
    if isinstance(col_type, str):
        if col_type_class := getattr(types, col_type, None):
            col_type = col_type_class()
        else:
            col_type = None
    
    Problem: The assignment of None to col_type if col_type_class is not found can be simplified. Improvement:
    if isinstance(col_type, str):
        col_type_class = getattr(types, col_type, None)
        col_type = col_type_class() if col_type_class else None
    

3. General Review of Code Quality and Style

  • Code Quality: The code changes are well-structured and address the problem of handling different column types effectively. The use of type hinting and type checking improves code readability and maintainability.
  • Code Style: The code adheres to PEP 8 standards, making it clean and easy to read. The use of modern Python features like type hinting and := operator (walrus operator) is commendable.
  • Testing: The addition of unit tests for the where_latest_partition method in both Presto and Trino engines ensures that the changes are tested and validated.

Additional Suggestions

  • Documentation: Consider adding docstrings to the modified methods to explain the purpose and functionality of the changes.
  • Error Handling: The use of broad exception handling (raise Exception) could be improved by raising more specific exceptions to provide better context in case of errors.

Overall, the pull request is well-implemented and addresses the problem effectively. The suggested improvements are minor and aim to further enhance the code quality.

Yours, Gooroo.dev. To receive reviews automatically, install Github App

gooroodev avatar May 25 '24 15:05 gooroodev