superset icon indicating copy to clipboard operation
superset copied to clipboard

chore(Databricks): New Databricks driver

Open Vitor-Avila opened this issue 9 months ago • 4 comments

SUMMARY

The sqlalchemy-databricks package (reference) was deprecated in favor of the updated version of databricks-sql-python (reference). This new version allows users to connect to Databricks using:

databricks://token:dapi***@***.cloud.databricks.com?http_path=/sql/***&catalog=**&schema=**

As a result, some more recent features are only supported with this driver. For example, if you create a table in Databricks with a CLUSTER BY configuration, it's not possible to create a dataset using databricks+connector:// as it doesn't successfully handle this metadata.

This PR introduces a new engine spec for the updated driver version, exposing a dynamic form. I believe long-term would be beneficial to deprecate the other engine specs (migrating existing connections to the new one) but for now this PR only implements support for the driver to validate with users on compatibility.

The dynamic form also includes fields for catalog and schema, so this could also help with catalog support.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

New dynamic form:

image

TESTING INSTRUCTIONS

  1. Connect to Databricks using the Databricks Python Connector option in the database dropdown.

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [ ] Required feature flags:
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

Vitor-Avila avatar May 08 '24 22:05 Vitor-Avila

Going to implement tests in a future commit

Vitor-Avila avatar May 08 '24 22:05 Vitor-Avila

@Vitor-Avila you need to rebase to fix the failing tests, @dpgaspar fixed them in master today.

betodealmeida avatar May 09 '24 12:05 betodealmeida

Codecov Report

Attention: Patch coverage is 73.46939% with 26 lines in your changes are missing coverage. Please review.

Project coverage is 70.04%. Comparing base (2e5f3ed) to head (23c07a8). Report is 59 commits behind head on master.

Files Patch % Lines
superset/db_engine_specs/databricks.py 76.40% 21 Missing :warning:
...eModal/DatabaseConnectionForm/CommonParameters.tsx 42.85% 4 Missing :warning:
...end/src/features/databases/DatabaseModal/index.tsx 50.00% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #28393      +/-   ##
==========================================
+ Coverage   60.49%   70.04%   +9.54%     
==========================================
  Files        1931     1938       +7     
  Lines       76241    77075     +834     
  Branches     8566     8641      +75     
==========================================
+ Hits        46122    53984    +7862     
+ Misses      28015    20976    -7039     
- Partials     2104     2115      +11     
Flag Coverage Δ
hive 49.10% <67.41%> (-0.06%) :arrow_down:
javascript 57.67% <44.44%> (-0.06%) :arrow_down:
mysql 77.21% <67.41%> (?)
postgres 77.32% <67.41%> (?)
presto 53.69% <67.41%> (-0.12%) :arrow_down:
python 83.22% <76.40%> (+19.74%) :arrow_up:
sqlite 76.77% <67.41%> (?)
unit 58.37% <76.40%> (+0.74%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 09 '24 14:05 codecov-commenter

thank you @betodealmeida 🙏 I think long term would be interesting to at least deprecate the connector driver and migrate existing connections to this new driver. I'm not too familiar with Databricks to tell if we still need to keep hive and odbc or if those could also be migrated.

Feel free to merge this PR if you want to continue working on catalogs -- I'll create a follow up PR to implement unit tests for this new driver.

Vitor-Avila avatar May 09 '24 16:05 Vitor-Avila