databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Entra ID Service Principal authentication

Open kvedes opened this issue 9 months ago • 4 comments

Hi,

According to the docs authentication using an Azure Entra ID service srincipal is not supported. However, if I generate a token using the msal library I can successfully execute queries using my service principal. The only issue is then the refreshing of the token. Since it is handed off to the library I cannot actively refresh it, and the token seems to expire in ~4h. From these observations it seems like a minor task to implement support for service principal auth, since the current auth flow supports the tokens generated from msal. Also the databricks-sdk package supports azure service principal auth, so it would make sense to leverage that code base for auth handling.

Example code:

# Azure Service Principal details
tenant_id = os.environ["AZURE_TENANT_ID"]
client_id = os.environ["AZURE_CLIENT_ID"]
client_secret = os.environ["AZURE_CLIENT_SECRET"]

# Authority URL
authority = f"https://login.microsoftonline.com/{tenant_id}"

# Scope for Azure Databricks
scope = ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"]

# Create a confidential client application
app = msal.ConfidentialClientApplication(
    client_id,
    authority=authority,
    client_credential=client_secret,
)

# Acquire a token
result = app.acquire_token_for_client(scopes=scope)

if "access_token" in result:
    access_token = result["access_token"]
    connection = sql.connect(
        server_hostname="...",
        http_path="...",
        access_token=access_token,
    )

    cursor = connection.cursor()

    cursor.execute("SELECT * from range(10)")
    print(cursor.fetchall())

    cursor.close()
    connection.close()
else:
    print("Error obtaining token:", result.get("error"))
    print(result.get("error_description"))
    print(result.get("correlation_id"))

kvedes avatar Mar 06 '25 09:03 kvedes

Figured out a way to use the ExternalAuthProvider in databricks.sql.auth.authenticators kudos for making this accessible. The code below works. However, refreshing of tokens is not handled but should be simple to add as part of the __call__ on MicrosoftServicePrincipalTokenSource

import msal
from databricks import sql
from databricks.sql.auth.authenticators import CredentialsProvider


class MicrosoftServicePrincipalTokenSource:

    def __init__(self, client_id, client_secret, tenant_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.tenant_id = tenant_id
        self._token: str | None = None

    def __call__(self) -> dict[str, str]:
        if self._token is None:
            self._token = self._get_token()
        return {"Authorization": f"Bearer {self._token}"}

    def _get_token(self) -> str:

        # Authority URL
        authority = f"https://login.microsoftonline.com/{self.tenant_id}"

        # Scope for Azure Databricks
        scope = ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"]

        # Create a confidential client application
        app = msal.ConfidentialClientApplication(
            client_id=self.client_id,
            authority=authority,
            client_credential=self.client_secret,
        )

        # Acquire a token
        result = app.acquire_token_for_client(scopes=scope)

        if "access_token" in result:
            access_token = result["access_token"]
            return access_token
        else:
            raise RuntimeError("Error obtaining token:", result.get("error"))


class MicrosoftServicePrincipalAuthProvider(CredentialsProvider):
    def __init__(self, client_id, client_secret, tenant_id):
        self.client_id = client_id
        self.client_secret = client_secret
        self.tenant_id = tenant_id

    def auth_type(self):
        return "MSSP"

    def __call__(self, *args, **kwargs):
        token_source = MicrosoftServicePrincipalTokenSource(self.client_id, self.client_secret, self.tenant_id)
        return token_source


connection = sql.connect(
    server_hostname="...",
    http_path="...",
    credentials_provider=MicrosoftServicePrincipalAuthProvider(
        client_id=os.environ["DATABRICKS_CLIENT_ID"],
        client_secret=os.environ["DATABRICKS_CLIENT_SECRET"],
        tenant_id=os.environ["DATABRICKS_TENANT_ID"],
    ),
)


cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())

kvedes avatar Mar 06 '25 13:03 kvedes

Hi @kvedes Thanks for the detailed feedback. We're planning to add support for Azure Entra ID service principal authentication soon, including proper token refresh handling

shivam2680 avatar Mar 10 '25 07:03 shivam2680

Hi @shivam2680 Thanks for the reply. Do you have an expected time line for this feature?

kvedes avatar Mar 11 '25 07:03 kvedes

We are yet to allot bandwidth for this. We aim to deliver this feature ASAP

shivam2680 avatar Mar 12 '25 04:03 shivam2680

@kvedes There is already an example that explain how to achieve it with databricks-sdk - https://github.com/databricks/databricks-sql-python/blob/main/examples/m2m_oauth.py cc @deeksha-db

jprakash-db avatar Apr 06 '25 06:04 jprakash-db

@jprakash-db Thank you, I hadn't seen that one.

kvedes avatar Apr 09 '25 10:04 kvedes