superset
superset copied to clipboard
chore(Databricks): New Databricks driver
SUMMARY
The sqlalchemy-databricks
package (reference) was deprecated in favor of the updated version of databricks-sql-python
(reference). This new version allows users to connect to Databricks using:
databricks://token:dapi***@***.cloud.databricks.com?http_path=/sql/***&catalog=**&schema=**
As a result, some more recent features are only supported with this driver. For example, if you create a table in Databricks with a CLUSTER BY
configuration, it's not possible to create a dataset using databricks+connector://
as it doesn't successfully handle this metadata.
This PR introduces a new engine spec for the updated driver version, exposing a dynamic form. I believe long-term would be beneficial to deprecate the other engine specs (migrating existing connections to the new one) but for now this PR only implements support for the driver to validate with users on compatibility.
The dynamic form also includes fields for catalog
and schema
, so this could also help with catalog support.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
New dynamic form:
TESTING INSTRUCTIONS
- Connect to Databricks using the Databricks Python Connector option in the database dropdown.
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Going to implement tests in a future commit
@Vitor-Avila you need to rebase to fix the failing tests, @dpgaspar fixed them in master
today.
Codecov Report
Attention: Patch coverage is 73.46939%
with 26 lines
in your changes are missing coverage. Please review.
Project coverage is 70.04%. Comparing base (
2e5f3ed
) to head (23c07a8
). Report is 59 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #28393 +/- ##
==========================================
+ Coverage 60.49% 70.04% +9.54%
==========================================
Files 1931 1938 +7
Lines 76241 77075 +834
Branches 8566 8641 +75
==========================================
+ Hits 46122 53984 +7862
+ Misses 28015 20976 -7039
- Partials 2104 2115 +11
Flag | Coverage Δ | |
---|---|---|
hive | 49.10% <67.41%> (-0.06%) |
:arrow_down: |
javascript | 57.67% <44.44%> (-0.06%) |
:arrow_down: |
mysql | 77.21% <67.41%> (?) |
|
postgres | 77.32% <67.41%> (?) |
|
presto | 53.69% <67.41%> (-0.12%) |
:arrow_down: |
python | 83.22% <76.40%> (+19.74%) |
:arrow_up: |
sqlite | 76.77% <67.41%> (?) |
|
unit | 58.37% <76.40%> (+0.74%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
thank you @betodealmeida 🙏 I think long term would be interesting to at least deprecate the connector
driver and migrate existing connections to this new driver. I'm not too familiar with Databricks to tell if we still need to keep hive
and odbc
or if those could also be migrated.
Feel free to merge this PR if you want to continue working on catalogs -- I'll create a follow up PR to implement unit tests for this new driver.