Connection test: [ERROR] - dbt-databricks behind proxy
Describe the bug
A clear and concise description of what the bug is. What command did you run? What happened? dbt debug gives error Connection test: [ERROR]
1 check failed: dbt was unable to connect to the specified database. The database returned the following error:
Runtime Error Database Error failed to connect
ENV set HTTP_PROXY HTTPS_PROXY
Does not seemed that proxy environment are being used curl to host/http_path is OK
Steps To Reproduce
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
dbt debug
Expected behavior
A clear and concise description of what you expected to happen. connection test OK
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The output of dbt --version:
Core:
- installed: 1.1.0
- latest: 1.1.0 - Up to date!
Plugins:
- databricks: 1.1.0 - Up to date!
- spark: 1.1.0 - Up to date!
The operating system you're using:
ubuntu
The output of python --version:
Python 3.8.10
Additional context
Add any other context about the problem here.
@thuanvan thanks for filing this. Investigating whether our Python connector supports HTTP proxies. Will get back to you!
getting this in curl test curl --netrc -v https://adb-REDACTED.azuredatabricks.net:443/sql/protocolv1/o/REDACTED/REDACTED
Error 500 javax.servlet.ServletException: org.apache.thrift.transport.TTransportException
@thuanvan I verified that we don't support proxy yet. We'll get this prioritized on our roadmap.
Thanks for confirming. Thank you for prioritizing it. We'll look into on how to get a proxy-bypass.
odd. since we have working instances where we go through proxy. Can you elaborate?
@thuanvan I'm waiting for the engineer to come back from vacation. He'll look into it.
I did some analysis previously: The dbt-databricks adaptor is based on thrift protocol. It is RPC not REST. And I cannot find it supports PROXY (in an easy way).
Our team's workaround is to use databricks IP whitelisting to protect the databricks workspace. Privatelink for databricks is still a beta feature, it might take a while to become GA
@xg1990 @thuanvan we are going to add proxy support to the Python connector first (https://github.com/databricks/databricks-sql-python/issues/22). Then we will add support to dbt-databricks.
When https://github.com/databricks/databricks-sql-python/issues/22 is fixed will this also fixed for this issue?
@thuanvan we're still waiting for https://github.com/databricks/databricks-sql-python/issues/22 to land.
Hey folks just letting you know that we have a fix for this under review in databricks-sql-connector here. The fix will be included in the connector version 2.3.1. If you want to test it in the interim we have a dev version that you can pip install databricks-sql-connector==2.3.1.dev1.
@susodapop I have just encountered this problem, however your suggestion did not fix it unfortunately. I just get a lot of "Hey I was called!" messages before ending up with "failed to connect"
@alexdiem thanks for the report! The hey I was called! message won't be present in the final release 😄
There's not enough information to reproduce your issue in your message. What values did you use for your proxy environment variables? Of course redact any sensitive information.
It is the exact same problem as Thuan has (we are colleagues in the same office), and several others have it as well. I have set export HTTP_PROXY="http://test:test@REDACTED:8080" export HTTPS_PROXY="http://test:test@REDACTED:8080" It is very odd because it used to work, and I did not make any changes to the proxy settings
Looks like the problem is in thrift.transport.THttpClient:
Problem code:
@staticmethod
def basic_proxy_auth_header(proxy):
if proxy is None or not proxy.username:
return None
ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
urllib.parse.unquote(proxy.password))
cr = base64.b64encode(ap).strip()
return "Basic " + cr
In my test, the HTTP(S)_PROXY environment variables values are correctly captured but ap is a "regular" string as opposed to a byte string thus the base64.b64encode() call fails. Fix:
@staticmethod
def basic_proxy_auth_header(proxy):
if proxy is None or not proxy.username:
return None
ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
urllib.parse.unquote(proxy.password))
cr = base64.b64encode(ap.encode()).decode().strip()
return "Basic " + cr
However, since the problem is in the thrift package, we can't simply fix it in this project...
@msdotnetclr we've actually fixed this in databricks-sql-connector without needing to update the upstream thrift dependency. It needs to merge and be deployed to Pypi, then we'll update the dbt-databricks dependency and proxies will work.
Here's the PR that fixes it databricks-sql-connector: https://github.com/databricks/databricks-sql-python/pull/81
Ah, nice. I only started to play around with the connector this morning and did not get to look into other linked issues. Good to know there is a better fix already!
The fix has merged into databricks-sql-connector and is part of release v2.5.0.
I'll open a PR here that bumps the dependency so we pick up the proxy fix.