dbt-databricks
dbt-databricks copied to clipboard
Ongoing OAuth Bug Tracking
I'm looking to get a census of users ability to connect via OAuth
If you use or have tried to use OAuth and see this bug, please reply with the following:
Cloud: <AWS|Azure|GCP> Mechanism: <M2M|U2M with default app|U2M with custom app> Success/Failure: dbt version: Last date you tried to connect with OAuth:
There's 18 of us on our team, and being able to login with oauthu2m via browser or using a profile from the ~/.databrickscfg file used by the CLI would make development so much easier. Especially because we use dbt clone to create instances of our data to dev against, so we pull state from dbfs as part of that (databricks fs cp dbfs://path/to/manifest.json local/path/to/manifest.json).
Cloud
AWS
Mechanism
U2M with default app maybe? I just omit the secrets and it redirects me to the browser.
Failure:
URL
https://dbc-REDACTED.cloud.databricks.com/oidc/v1/authorize?response_type=code&client_id=dbt-databricks&redirect_uri=http%3A%2F%2Flocalhost%3A8020&scope=all-apis+offline_access&state=REDACTED&code_challenge=REDACTED&code_challenge_method=S256Response from URL:{ "error_description": "redirect_uri 'http:localhost:8020' not registered for OAuth application 'dbt-databricks'", "error": "invalid_request" }
dbt version:
% dbt --version Core: - installed: 1.7.11 - latest: 1.7.11 - Up to date! Plugins: - databricks: 1.7.13 - Up to date! - spark: 1.7.1 - Up to date!
Last date you tried to connect with OAuth:
Sat Apr 13 06:46:34 BST 2024
Cloud: Azure Mechanism: Tried both U2M with default app as well as U2M with custom app Success/Failure: Both failed with error message {"error_description":"OAuth application with client_id: 'XXX' not available in Databricks account 'XXX'.","error":"invalid_request"} dbt version: core 1.7.11 and databricks plugin 1.7.13 Last date you tried to connect with OAuth: 2024-04-16T20:50:12Z
Thanks for the reports. I merged a PR from @stevenayers-bge yesterday that in principle allows configuration of the other necessary piece (now both redirect url and scopes can be configured), unfortunately it does not fix the issue for Azure (where the Databricks SDK has the scopes hardcoded). I'm working with the Databricks SDK team to get this unblocked, so thanks for the reports and patience.
@benc-db, I have used both Azure and AWS now.
Cloud: Azure Mechanism: U2M with custom app Success/Failure: Success! dbt version: 1.7.14 Last date you tried to connect with OAuth: 13/05/2024
Cloud: Azure
Mechanism: U2M with default app
Success/Failure: Failure. Browser pops up but before I can sign in I see this message:
AADSTS700016: Application with identifier 'dbt-databricks' was not found in the directory 'xxxx'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. I don't have the access rights to check if the default application is enable for our account.
dbt version: 1.7.14
Last date you tried to connect with OAuth: 13/05/2024
Cloud: AWS Mechanism: U2M with default app Success/Failure: Failure (partly). With the 1.7.14 release I set oauth_redirect_url to 'http://localhost:8050' and oauth_scopes to 'sql' and 'offline_access'. This works! I get a pop-up, login and can run my models after that without any problems. However, when I try to refresh a model created as a 'streaming table' I get this error:
Runtime Error
Error getting info for materialized view/streaming table <<catalog_name>>.<<schema_name>>.<<model_name>>: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 403 Invalid scope</title>
</head>
<body><h2>HTTP ERROR 403</h2>
<p>Problem accessing /api/2.1/unity-catalog/tables/<<catalog_name>>.<<schema_name>>.<<model_name>>. Reason:
<pre> Invalid scope</pre></p>
</body>
</html>
So my guess is that the default scope the dbt-databricks application gets (sql) is enough to create streaming tables but not to refresh them. Once I change my profiles.yml to use my PAT, I can refresh the 'streaming tables'. Also, when I run the models with oAuth and the --full-refresh flag, they are successfully recreated. dbt version: 1.7.14 Last date you tried to connect with OAuth: 13/05/2024
We are bumping into the same issue. Previously (around 2024-04-08), when setting auth_type: oauth, and specifiying the azuread client id, we always got an AAD login screen (login.microsoftonline.net..), which redirected back to localhost and login was successful. The AAD app was set up as described here.
When a new user started using dbt yesterday, she gets redirected to the databricks oidc endpoint (looks like https://adb-xxxxxxxx.azuredatabricks.net/oidc/v1/authorize?response_type=code&client_id=<my_aad_client_id>&redirect_uri=http%3A%2F%2Flocalhost%3A8020&scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2Fuser_impersonation+offline_access&state=xxxxxxx&code_challenge=xxxxxxx&code_challenge_method=S256.
She immediately sees the browser error:
{
"error_description": "OAuth application with client_id: '<my_aad_client_id>' not available in Databricks account '<my_databricks_account>'.",
"error": "invalid_request"
}
When I create an application at https://accounts.azuredatabricks.net/settings/app-integrations (with All APIs permission), and put that Client ID in the profiles.yml, sign-in works perfectly.
OS: Windows relevant(?) python packages
dbt-core==1.7.14
dbt-databricks==1.7.14
At the same time, when I do it (windows, same packages), I get the AAD login screen and can login perfectly.
1 day later, I try do to it on the following machine WSL (linux)
dbt-core==1.7.7
dbt-databricks==1.7.3
When supplying my AAD client app, the Databricks oidc endpoint opens up, with the same error message:
{
"error_description": "OAuth application with client_id: '<my_aad_client_id>' not available in Databricks account '<my_databricks_account>'.",
"error": "invalid_request"
}
Who/how is decided when to open the AAD url (login.microsoftonline.net), and why is it 'flaky'? I prefer using an AAD app as documented.
Let me know if I can provide extra information
@benc-db , ever since dbt-databricks 1.7.14 I get a warning in the terminal when the oAuth logic is not able to fetch a refresh token. This happens after an hour when the access token expires. A new browser tab is opened and a new token is fetched. I can connect just fine after that but the warning and the hourly browser popup is rather confusing. Before 1.7.14, an error during token refresh would end up in the logger.debug instead of logger.warning. So I would still get the browser popup but no warning in the terminal. This is the warning:
AADSTS9002326: Cross-origin token redemption is permitted only for the 'Single-Page Application' client-type. Request origin: 'http://localhost:8020'.
I want to get rid of the warning and the fact that it logs in every hour so I did a deep dive into what is going on.
Fix on databricks-sdk We use azure with a custom app registration. This app registration is setup with the 'Mobile and desktop applications' authentication platform and 'http://localhost:8020' as the redirect url like it is documented here. This works fine for the first call (if there is no token stored yet). But after an hour, when the access_token expires, it tries to use the refresh_token to get a new token (like it should) but there it automatically sets the 'origin' header for oAuth on azure assuming it uses the SPA (Single-Page Application) authentication platform. As we do not use this, it throws the error above, causing the refresh to fail and the new tab to open. I think this is bug/inconsistency in the SDK. I think it should first try it without setting the header and only set the header as a fall back just like it does when getting the authorization_code (first step of this oAuth flow). I am more then happy to create an issue and PR on the databricks-sdk repo and discuss this with them but then I saw the requirements.txt of dbt-databricks is pinned to databricks-sdk 0.17.0. So if they would accept my PR, it still would not solve my problem. So my first question is: can dbt-databricks also use the latest databricks-sdk once released or is there a reason to pin it to 0.17.0?
Alternative I also tried to solve this the other way around. Instead of using the 'Mobile and desktop applications' authentication platform for our app registration I changed this into the spa (Single-page application) platform. This solved part of the problem but not entirely. I get a valid token and after an hour it it is also refreshed without any errors or new login tabs so that is good. Unfortunately, a token for a spa platform is only valid for 24h after which you need to login again. I am fine with the new browser tab once every 24h but it also throws this error which now is printed on the terminal:
invalid_grant: AADSTS700084: The refresh token was issued to a single page app (SPA), and therefore has a fixed, limited lifetime of 1.00:00:00, which cannot be extended. It is now expired and a new sign in request must be sent by the SPA to the sign in page.
I would like to get rid of this warning as it can be confusing for our developers. Perhaps the dbt-databricks code could check for this exact error code (AADSTS700084) here. If that error code is thrown, it could do a logger.debug instead of logger.warning. I am more than happy to test this and create PR for it if you want to.
@benc-db, what do you think?
So my first question is: can dbt-databricks also use the latest databricks-sdk once released or is there a reason to pin it to 0.17.0?
We've been waiting for some needed changes in SDK to be able to update, most notably the fact that Azure scopes passed to the OAuth client were just ignored.
I'm looking to get a census of users ability to connect via OAuth
If you use or have tried to use OAuth and see this bug, please reply with the following:
Cloud: <AWS|Azure|GCP> Mechanism: <M2M|U2M with default app|U2M with custom app> Success/Failure: dbt version: Last date you tried to connect with OAuth:
Thanks for the info @benc-db. I think you are referring to this line of code in 0.17.0 where the scopes for azure are simply overwritten? In the latest version I don't see that line anymore and the scope can be set on the client.
I took the liberty of creating a PR for both solutions. One on databricks-sdk to solve the issue with non-spa app registrations and one on this repo to gracefully ignore the 'by design' error/notification for when the token can no longer be refreshed on a spa app registration. Hope you can look in to that last one.
This PR in databricks-sdk (https://github.com/databricks/databricks-sdk-py/pull/513) fixes a bug with Azure Databricks M2M OAuth, the fix is available since 0.18.0.
So, updating that dependency version is highly anticipated, we in Fivetran have customers complaining for related errors. Also, I can confirm that a custom build (dbt 1.7.3, dbt-databricks 1.7.2, databricks-sdk 0.28.0) works fine for those customers.
@fivetran-andreymogilev you say it fixed Azure M2M in 0.18, but the reason I set the pin at 0.17.0 is that it broke all of our tests, which use Azure M2M. In what way did it fix?
For what it's worth, I have a branch where I'm testing 0.28.0, https://github.com/databricks/dbt-databricks/tree/auth_testing, if anyone wants to test out and report back. I'm also talking to the owner of the SDK to work through the remaining issues.
@fivetran-andreymogilev Ahhh, so, I think what happened here is that the recommend Azure M2M flow changed after implementation. So, what I've been working towards is using the client_secret from Azure, but I see the latest instructions say get an OAuth secret from Databricks.
@fivetran-andreymogilev thank you for sharing! Your comment led to a breakthrough for me :P.
@benc-db The only working flow for M2M OAuth I tested is what described here: https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html. It uses client_id and client_secret obtained from Azure manually (e.g. during the principal creation). It is important to use Databrick's endpoint (https://accounts.cloud.databricks.com/oidc/accounts/
@benc-db Thank you, 0.28.0 seems a good choice for a dependency!