databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

.databrickscfg authentication support

Open DJSaunders1997 opened this issue 1 year ago • 8 comments

Are there any plans to support authentication through the same .databrickscfg file used by the Databricks CLI, and the Databricks VSCode extension?

It would be useful to have one config file used by all Databricks tools, especially for when tokens expire.

For this project in particular I think it would help users that use multiple profiles / workspaces regularly.

Thanks

DJSaunders1997 avatar Jun 19 '23 15:06 DJSaunders1997

Yes we can do this. We already have primitives for diverse authentication schemes like OAuth, AAD etc within auth.py.

susodapop avatar Jul 11 '23 02:07 susodapop

@susodapop Any update on this request?

narquette avatar Sep 01 '23 16:09 narquette

Adding on my interest in this feature as well, and thank you for considering!

jthomasmock avatar Nov 29 '23 03:11 jthomasmock

@susodapop I was looking at tackling this and I saw your comment here about adding a dependency on the databricks-sdk. If that's an option then I think this shouldn't be too hard to support by adding a databricks-cli auth provider type as a fallback option in the chain here:

https://github.com/databricks/databricks-sql-python/blob/2d2b3c1c442223be5b8ce21742c66bd27b56b51f/src/databricks/sql/auth/auth.py#L45-L62

If that seems reasonable then I'm happy to pick this up. I've got a few apps that use both the SDK and the sql client and I'd really like to be able to use the databricks-cli auth provider for clients when I'm running the apps locally.

dbkegley avatar Aug 01 '24 21:08 dbkegley

Hi @dbkegley thanks for the ping. I haven't been actively working on this repository since January 2024 (although I plan to continue contributing in the future :pray:). I had to look-up the comment your referenced as I wasn't sure of the relevance here.

Although it's up to the maintainers to decide, I would strongly advise against requiring databricks-sdk during installation of databricks-sql-connector. databricks-sql-connector is supposed to be a lightweight driver for accessing databricks. The thrust has been towards removing dependencies; not adding more.

That said, if you can implement this in a way that works if databricks-sdk is installed and behaves normally otherwise, that could be acceptable -- although even this is a bit more magical than befits a low-level driver like this.

I think that long-term the true solution should be to implement first-class support for .databrickscfg files rather than adding a dependency to achieve it. These config files are not that difficult to parse...

susodapop avatar Aug 01 '24 21:08 susodapop

Makes sense, thanks for the context. Sounds like adding a dependency on the SDK probably isn't the best approach here. I'm going to table this for now because I can accomplish what I need already by using the SDK's credential providers w/ the sql-client as described in the example

dbkegley avatar Aug 02 '24 14:08 dbkegley

Following up here because there is a little awkwardness to make the SDK's credentials providers work with the ExternalAuthProvider implementation. The OAuthClient documented in the example from my previous comment works fine, but many of the other types of CredentialProviders require some extra work. I think this is because many of the SDK's credentials_strategy (formerly credentials_provider) accept a Config object as an argument to __call__ but this is dropped by the ExternalAuthProvider:

https://github.com/databricks/databricks-sql-python/blob/b438c385112e6f27983bd771c556c8c7596fef22/src/databricks/sql/auth/authenticators.py#L141-L143

The SessionCredentials used by the OAuthClient does not use the cfg arg which is why it works:

https://github.com/databricks/databricks-sdk-py/blob/7d22b4d3727478f0f5dbeb34b7f6fc17a03e31b7/databricks/sdk/oauth.py#L241-L247

The workaround is to provide a no-arg callable that wraps the credentials_provider constructor:

from databricks import sql
from databricks.sdk.core import Config, databricks_cli

cfg = Config(host=DATABRICKS_HOST)

with sql.connect(
    server_hostname=DATABRICKS_HOST,
    http_path=SQL_HTTP_PATH,
    credentials_provider=lambda: databricks_cli(cfg),
) as connection:
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM samples.nyctaxi.trips LIMIT 10;")
        return cursor.fetchall()

dbkegley avatar Aug 06 '24 15:08 dbkegley

Oophta, that's some awkwardness in the API design across these libraries. But your solution is a sane one. Thanks for sharing it!

susodapop avatar Aug 06 '24 15:08 susodapop