airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

Destination databricks: switch to oss jdbc driver

Open edgao opened this issue 1 year ago • 2 comments

closes https://github.com/airbytehq/airbyte-internal-issues/issues/9120

also it seems like we don't need the databricks sdk at all?

The new driver has a slightly different interface (you can't directly supply a URL, it forces you to supply individual fields/properties). I tried to port over our existing stuff, but removed the transportMode=http and EnableArrow=0 things to see if they're still needed.

Databricks documentation doesn't even describe how to do oauth, it only says how to do PAT (https://docs.gcp.databricks.com/en/integrations/jdbc/oss.html#authenticate-the-driver). I copied our old stuff to the new interfaces naively, but it doesn't work

  • DatabricksSQLException: Communication link failure. Failed to connect to server. :https://dbc-6aebf761-f8d6.cloud.databricks.com:443accessToken must be defined
  • DatabricksSQLException: Communication link failure. Failed to connect to server. :https://dbc-6aebf761-f8d6.cloud.databricks.com:443Cannot invoke "com.databricks.sdk.core.oauth.OpenIDConnectEndpoints.getTokenEndpoint()" because "jsonResponse" is null).

notable changes in the oss driver:

  • timestamps with timezone now have a timezone directly from the driver
  • timestamps without timezone have .000 precision
  • Inline byte limit exceeded. Statements executed with disposition=INLINE can have a result size of at most 26214400 bytes. Please execute the statement with disposition=EXTERNAL_LINKS if you want to download the full result

which means:

  • the destinationhandler can parse directly to an Instant, instead of needing to go through LocalDateTime
  • tons of changes in the expected records

edgao avatar Aug 14 '24 15:08 edgao