superset
superset copied to clipboard
fix: Handling of column types for Presto, Trino, et al.
SUMMARY
This is a rehash of @brouberol's https://github.com/apache/superset/pull/26782 which was later reverted in https://github.com/apache/superset/pull/28613 as it was problematic with Trino, given both the PrestoEngineSpec and TrinoEngineSpec derive from the PrestoBaseEngineSpec.
The TL;DR is typically most SQLAlchemy dialects (including Trino) use native SQLAlchemy types for encoding column types, whereas PyHive is unconventional and uses str. I asked a similar question years ago and you can see here how these types are materialized.
I hypothesize that part of the confusion when @brouberol's was authoring their PR was that the ResultSetColumnType TypedDict was incorrect, alluding to the fact that the type field was Optional[str] as opposed to Optional[Union[str, TypeEngine, type[TypeEngine]]. This PR simply ensures that in the PrestoBaseEngineSpec class—which handles Hive, Presto, and Trino—that we only try to coerce str types to a SQLAlchemy type (if available).
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Added unit tests.
ADDITIONAL INFORMATION
- [x] Has associated issue:
- Fixes https://github.com/apache/superset/issues/25636
- Fixes https://github.com/apache/superset/issues/25962
- Fixes https://github.com/apache/superset/issues/26740
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Codecov Report
Attention: Patch coverage is 87.50000% with 1 lines in your changes are missing coverage. Please review.
Project coverage is 83.50%. Comparing base (
76d897e) to head (6a35459). Report is 158 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| superset/db_engine_specs/presto.py | 80.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #28653 +/- ##
===========================================
+ Coverage 60.48% 83.50% +23.01%
===========================================
Files 1931 522 -1409
Lines 76236 37497 -38739
Branches 8568 0 -8568
===========================================
- Hits 46114 31313 -14801
+ Misses 28017 6184 -21833
+ Partials 2105 0 -2105
| Flag | Coverage Δ | |
|---|---|---|
| hive | 49.12% <37.50%> (-0.05%) |
:arrow_down: |
| javascript | ? |
|
| mysql | 77.16% <62.50%> (?) |
|
| postgres | 77.30% <62.50%> (?) |
|
| presto | 53.67% <37.50%> (-0.13%) |
:arrow_down: |
| python | 83.50% <87.50%> (+20.02%) |
:arrow_up: |
| sqlite | 76.75% <62.50%> (?) |
|
| unit | 58.97% <75.00%> (+1.35%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@gooroodev review
1. Summary of Changes
The pull request addresses the handling of column types for Presto and Trino engines in the Superset project. The key changes include:
- Presto Engine:
- Handling of column types in the
get_childrenandwhere_latest_partitionmethods. - Added logic to convert string column types to SQLAlchemy types dynamically.
- Handling of column types in the
- Trino Engine:
- Added a similar handling of column types in the
where_latest_partitionmethod. - Added unit tests for the
where_latest_partitionmethod.
- Added a similar handling of column types in the
- Typing Changes:
- Introduced a new type alias
SQLTypeinsuperset_typing.pyto represent SQLAlchemy column types. - Updated
ResultSetColumnTypeto useSQLType.
- Introduced a new type alias
2. Issues, Bugs, or Typos
Issue 1:
- File:
superset/db_engine_specs/presto.py - Code:
Problem: The use ofmatch = pattern.match(cast(str, column["type"]))cast(str, column["type"])is unnecessary ifcolumn["type"]is already a string. Improvement:match = pattern.match(column["type"])
Issue 2:
- File:
superset/db_engine_specs/presto.py - Code:
Problem: The assignment ofif isinstance(col_type, str): if col_type_class := getattr(types, col_type, None): col_type = col_type_class() else: col_type = NoneNonetocol_typeifcol_type_classis not found can be simplified. Improvement:if isinstance(col_type, str): col_type_class = getattr(types, col_type, None) col_type = col_type_class() if col_type_class else None
3. General Review of Code Quality and Style
- Code Quality: The code changes are well-structured and address the problem of handling different column types effectively. The use of type hinting and type checking improves code readability and maintainability.
- Code Style: The code adheres to PEP 8 standards, making it clean and easy to read. The use of modern Python features like type hinting and
:=operator (walrus operator) is commendable. - Testing: The addition of unit tests for the
where_latest_partitionmethod in both Presto and Trino engines ensures that the changes are tested and validated.
Additional Suggestions
- Documentation: Consider adding docstrings to the modified methods to explain the purpose and functionality of the changes.
- Error Handling: The use of broad exception handling (
raise Exception) could be improved by raising more specific exceptions to provide better context in case of errors.
Overall, the pull request is well-implemented and addresses the problem effectively. The suggested improvements are minor and aim to further enhance the code quality.
Yours, Gooroo.dev. To receive reviews automatically, install Github App