airflow-dbt-python icon indicating copy to clipboard operation
airflow-dbt-python copied to clipboard

[Feature] Support HTTP authentication for DbtGitRemote

Open KarthikRajashekaran opened this issue 1 year ago • 8 comments

I am trying to use GitLab DBT project repo using DbtGitRemoteHook

 dbt_run = DbtRunOperator(
        task_id="dbt_run",
        project_dir="https://domain/abc/-/tree/main/dbt/db_metrics?private_token=abcdesf",
        dbt_conn_id="dbt_conn_id",
        target="dev",
        do_xcom_push_artifacts=["run_results.json"],
    )
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='scm.platform.us-west-2.io', port=443): Max retries exceeded with url: https://scm.platform.us-west-2.io/users/auth/saml (Caused by ResponseError('too many redirects'))

When i tried to access on the browser with https and the token i was able to access

Also tried with below project_dir

project_dir="https://$gitlabUser:$gitlabToken@domain/abc.git",

File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 705, in clone
  result = self.fetch(path, target, progress=progress, depth=depth)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 782, in fetch
  result = self.fetch_pack(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2085, in fetch_pack
  refs, server_capabilities, url = self._discover_references(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 1941, in _discover_references
  resp, read = self._http_request(url, headers)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2219, in _http_request
  raise HTTPUnauthorized(resp.headers.get("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided

KarthikRajashekaran avatar Mar 30 '23 03:03 KarthikRajashekaran

Thanks for opening an issue.

Support for authentication in git remotes was not implemented, hence the error. I'm working on a patch for this and already got it working. Just need to do some clean up of the code and add some tests.

We'll do this in two steps:

  • First release (v1.0.4) will support HTTPS auth by specifying a user/password or token in the URL. This should be enough to get you going: locally I'm able to pull a private repo from GitLab using a project_dir in the form https://oauth2:<my-personal-access-token>/gitlab.com/tomasfarias/<my-private-repo>, which is equivalent to your second attempt.

  • Second release (likely v1.0.5, but potentially 1.1.0), will include proper Airflow connection support so that you can store your credentials in Airflow instead of having to pass them as the project's URL. This requires a bit of refactoring in the git remote, but very doable otherwise.

tomasfarias avatar Mar 30 '23 17:03 tomasfarias

Tentative release for v1.0.4 ?

KarthikRajashekaran avatar Mar 30 '23 18:03 KarthikRajashekaran

v1.0.4 going out later today assuming CI is green.

tomasfarias avatar Mar 31 '23 19:03 tomasfarias

I see that v1.0.4 is already available on PyPI. However, doing pip install airflow-dbt-python==1.0.4 seems to install a version of the code that does not have the changes from #113. Also, pyproject.toml still says version = "1.0.3". Is there a problem? Or am I missing something?

alvaromendoza avatar Apr 03 '23 11:04 alvaromendoza

Thanks for bringing this up @alvaromendoza.

I think I may have tagged the wrong commit, and thus 1.0.4 was deployed without the changes. Unfortunately, PyPI doesn't allow overwriting existing releases, so I will go ahead and do a 1.0.5 release. This will just be what v1.0.4 was intended to be, no other changes. I may yank 1.0.4 afterwards, just so that folks upgrade from 1.0.3 to 1.0.5 directly.

Sorry for the inconveniences. The deployment pipeline is all automated except for bumping the version and tagging the commit, and I couldn't get that right :sweat_smile:

tomasfarias avatar Apr 03 '23 12:04 tomasfarias

Just pushed tag v1.0.5 which does have the latest changes as you can verify looking at the tree: https://github.com/tomasfarias/airflow-dbt-python/tree/v1.0.5/airflow_dbt_python/hooks.

It should be deployed shortly to PyPI.

tomasfarias avatar Apr 03 '23 12:04 tomasfarias

Preferably to add verify ssl = bool

to avoid this error: SSL: CERTIFICATE_VERIFY_FAILED

FouadApp avatar Apr 03 '23 13:04 FouadApp

does version v1.0.5 support both remote GitLab clone and airflow connections?

KarthikRajashekaran avatar Apr 03 '23 23:04 KarthikRajashekaran