airflow-dbt-python
airflow-dbt-python copied to clipboard
[Feature] Support HTTP authentication for DbtGitRemote
I am trying to use GitLab DBT project repo using DbtGitRemoteHook
dbt_run = DbtRunOperator(
task_id="dbt_run",
project_dir="https://domain/abc/-/tree/main/dbt/db_metrics?private_token=abcdesf",
dbt_conn_id="dbt_conn_id",
target="dev",
do_xcom_push_artifacts=["run_results.json"],
)
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='scm.platform.us-west-2.io', port=443): Max retries exceeded with url: https://scm.platform.us-west-2.io/users/auth/saml (Caused by ResponseError('too many redirects'))
When i tried to access on the browser with https and the token i was able to access
Also tried with below project_dir
project_dir="https://$gitlabUser:$gitlabToken@domain/abc.git",
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 705, in clone
result = self.fetch(path, target, progress=progress, depth=depth)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 782, in fetch
result = self.fetch_pack(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2085, in fetch_pack
refs, server_capabilities, url = self._discover_references(
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 1941, in _discover_references
resp, read = self._http_request(url, headers)
File "/home/airflow/.local/lib/python3.10/site-packages/dulwich/client.py", line 2219, in _http_request
raise HTTPUnauthorized(resp.headers.get("WWW-Authenticate"), url)
dulwich.client.HTTPUnauthorized: No valid credentials provided
Thanks for opening an issue.
Support for authentication in git remotes was not implemented, hence the error. I'm working on a patch for this and already got it working. Just need to do some clean up of the code and add some tests.
We'll do this in two steps:
-
First release (v1.0.4) will support HTTPS auth by specifying a user/password or token in the URL. This should be enough to get you going: locally I'm able to pull a private repo from GitLab using a
project_dir
in the formhttps://oauth2:<my-personal-access-token>/gitlab.com/tomasfarias/<my-private-repo>
, which is equivalent to your second attempt. -
Second release (likely v1.0.5, but potentially 1.1.0), will include proper Airflow connection support so that you can store your credentials in Airflow instead of having to pass them as the project's URL. This requires a bit of refactoring in the git remote, but very doable otherwise.
Tentative release for v1.0.4 ?
v1.0.4 going out later today assuming CI is green.
I see that v1.0.4 is already available on PyPI. However, doing pip install airflow-dbt-python==1.0.4
seems to install a version of the code that does not have the changes from #113. Also, pyproject.toml
still says version = "1.0.3"
. Is there a problem? Or am I missing something?
Thanks for bringing this up @alvaromendoza.
I think I may have tagged the wrong commit, and thus 1.0.4 was deployed without the changes. Unfortunately, PyPI doesn't allow overwriting existing releases, so I will go ahead and do a 1.0.5 release. This will just be what v1.0.4 was intended to be, no other changes. I may yank 1.0.4 afterwards, just so that folks upgrade from 1.0.3 to 1.0.5 directly.
Sorry for the inconveniences. The deployment pipeline is all automated except for bumping the version and tagging the commit, and I couldn't get that right :sweat_smile:
Just pushed tag v1.0.5 which does have the latest changes as you can verify looking at the tree: https://github.com/tomasfarias/airflow-dbt-python/tree/v1.0.5/airflow_dbt_python/hooks.
It should be deployed shortly to PyPI.
Preferably to add verify ssl = bool
to avoid this error: SSL: CERTIFICATE_VERIFY_FAILED
does version v1.0.5 support both remote GitLab clone and airflow connections?