dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Connection test: [ERROR] - dbt-databricks behind proxy

Open thuanvan opened this issue 3 years ago • 9 comments

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened? dbt debug gives error Connection test: [ERROR]

1 check failed: dbt was unable to connect to the specified database. The database returned the following error:

Runtime Error Database Error failed to connect

ENV set HTTP_PROXY HTTPS_PROXY

Does not seemed that proxy environment are being used curl to host/http_path is OK

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

dbt debug

Expected behavior

A clear and concise description of what you expected to happen. connection test OK

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version: Core:

  • installed: 1.1.0
  • latest: 1.1.0 - Up to date!

Plugins:

  • databricks: 1.1.0 - Up to date!
  • spark: 1.1.0 - Up to date!

The operating system you're using: ubuntu The output of python --version: Python 3.8.10

Additional context

Add any other context about the problem here.

thuanvan avatar Jun 10 '22 13:06 thuanvan

@thuanvan thanks for filing this. Investigating whether our Python connector supports HTTP proxies. Will get back to you!

bilalaslamseattle avatar Jun 15 '22 11:06 bilalaslamseattle

getting this in curl test curl --netrc -v https://adb-REDACTED.azuredatabricks.net:443/sql/protocolv1/o/REDACTED/REDACTED

Error 500 javax.servlet.ServletException: org.apache.thrift.transport.TTransportException

thuanvan avatar Jun 15 '22 18:06 thuanvan

@thuanvan I verified that we don't support proxy yet. We'll get this prioritized on our roadmap.

bilalaslamseattle avatar Jun 16 '22 06:06 bilalaslamseattle

Thanks for confirming. Thank you for prioritizing it. We'll look into on how to get a proxy-bypass.

thuanvan avatar Jun 16 '22 07:06 thuanvan

odd. since we have working instances where we go through proxy. Can you elaborate?

thuanvan avatar Jun 20 '22 10:06 thuanvan

@thuanvan I'm waiting for the engineer to come back from vacation. He'll look into it.

bilalaslamseattle avatar Jun 20 '22 10:06 bilalaslamseattle

I did some analysis previously: The dbt-databricks adaptor is based on thrift protocol. It is RPC not REST. And I cannot find it supports PROXY (in an easy way).

Our team's workaround is to use databricks IP whitelisting to protect the databricks workspace. Privatelink for databricks is still a beta feature, it might take a while to become GA

xg1990 avatar Jul 14 '22 23:07 xg1990

@xg1990 @thuanvan we are going to add proxy support to the Python connector first (https://github.com/databricks/databricks-sql-python/issues/22). Then we will add support to dbt-databricks.

bilalaslamseattle avatar Jul 26 '22 14:07 bilalaslamseattle

When https://github.com/databricks/databricks-sql-python/issues/22 is fixed will this also fixed for this issue?

thuanvan avatar Aug 16 '22 07:08 thuanvan

@thuanvan we're still waiting for https://github.com/databricks/databricks-sql-python/issues/22 to land.

bilalaslamseattle avatar Nov 11 '22 14:11 bilalaslamseattle

Hey folks just letting you know that we have a fix for this under review in databricks-sql-connector here. The fix will be included in the connector version 2.3.1. If you want to test it in the interim we have a dev version that you can pip install databricks-sql-connector==2.3.1.dev1.

susodapop avatar Jan 12 '23 22:01 susodapop

@susodapop I have just encountered this problem, however your suggestion did not fix it unfortunately. I just get a lot of "Hey I was called!" messages before ending up with "failed to connect"

alexdiem avatar Feb 24 '23 15:02 alexdiem

@alexdiem thanks for the report! The hey I was called! message won't be present in the final release 😄

There's not enough information to reproduce your issue in your message. What values did you use for your proxy environment variables? Of course redact any sensitive information.

susodapop avatar Feb 24 '23 19:02 susodapop

It is the exact same problem as Thuan has (we are colleagues in the same office), and several others have it as well. I have set export HTTP_PROXY="http://test:test@REDACTED:8080" export HTTPS_PROXY="http://test:test@REDACTED:8080" It is very odd because it used to work, and I did not make any changes to the proxy settings

alexdiem avatar Feb 27 '23 07:02 alexdiem

Looks like the problem is in thrift.transport.THttpClient:

Problem code:

    @staticmethod
    def basic_proxy_auth_header(proxy):
        if proxy is None or not proxy.username:
            return None
        ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
                        urllib.parse.unquote(proxy.password))
        cr = base64.b64encode(ap).strip()
        return "Basic " + cr

In my test, the HTTP(S)_PROXY environment variables values are correctly captured but ap is a "regular" string as opposed to a byte string thus the base64.b64encode() call fails. Fix:


    @staticmethod
    def basic_proxy_auth_header(proxy):
        if proxy is None or not proxy.username:
            return None
        ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
                        urllib.parse.unquote(proxy.password))
        cr = base64.b64encode(ap.encode()).decode().strip()
        return "Basic " + cr

However, since the problem is in the thrift package, we can't simply fix it in this project...

msdotnetclr avatar Mar 24 '23 17:03 msdotnetclr

@msdotnetclr we've actually fixed this in databricks-sql-connector without needing to update the upstream thrift dependency. It needs to merge and be deployed to Pypi, then we'll update the dbt-databricks dependency and proxies will work.

susodapop avatar Mar 24 '23 18:03 susodapop

Here's the PR that fixes it databricks-sql-connector: https://github.com/databricks/databricks-sql-python/pull/81

susodapop avatar Mar 24 '23 18:03 susodapop

Ah, nice. I only started to play around with the connector this morning and did not get to look into other linked issues. Good to know there is a better fix already!

msdotnetclr avatar Mar 24 '23 19:03 msdotnetclr

The fix has merged into databricks-sql-connector and is part of release v2.5.0.

I'll open a PR here that bumps the dependency so we pick up the proxy fix.

susodapop avatar Apr 15 '23 16:04 susodapop