Entra ID Service Principal authentication
Hi,
According to the docs authentication using an Azure Entra ID service srincipal is not supported. However, if I generate a token using the msal library I can successfully execute queries using my service principal. The only issue is then the refreshing of the token. Since it is handed off to the library I cannot actively refresh it, and the token seems to expire in ~4h.
From these observations it seems like a minor task to implement support for service principal auth, since the current auth flow supports the tokens generated from msal. Also the databricks-sdk package supports azure service principal auth, so it would make sense to leverage that code base for auth handling.
Example code:
# Azure Service Principal details
tenant_id = os.environ["AZURE_TENANT_ID"]
client_id = os.environ["AZURE_CLIENT_ID"]
client_secret = os.environ["AZURE_CLIENT_SECRET"]
# Authority URL
authority = f"https://login.microsoftonline.com/{tenant_id}"
# Scope for Azure Databricks
scope = ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"]
# Create a confidential client application
app = msal.ConfidentialClientApplication(
client_id,
authority=authority,
client_credential=client_secret,
)
# Acquire a token
result = app.acquire_token_for_client(scopes=scope)
if "access_token" in result:
access_token = result["access_token"]
connection = sql.connect(
server_hostname="...",
http_path="...",
access_token=access_token,
)
cursor = connection.cursor()
cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())
cursor.close()
connection.close()
else:
print("Error obtaining token:", result.get("error"))
print(result.get("error_description"))
print(result.get("correlation_id"))
Figured out a way to use the ExternalAuthProvider in databricks.sql.auth.authenticators kudos for making this accessible. The code below works. However, refreshing of tokens is not handled but should be simple to add as part of the __call__ on MicrosoftServicePrincipalTokenSource
import msal
from databricks import sql
from databricks.sql.auth.authenticators import CredentialsProvider
class MicrosoftServicePrincipalTokenSource:
def __init__(self, client_id, client_secret, tenant_id):
self.client_id = client_id
self.client_secret = client_secret
self.tenant_id = tenant_id
self._token: str | None = None
def __call__(self) -> dict[str, str]:
if self._token is None:
self._token = self._get_token()
return {"Authorization": f"Bearer {self._token}"}
def _get_token(self) -> str:
# Authority URL
authority = f"https://login.microsoftonline.com/{self.tenant_id}"
# Scope for Azure Databricks
scope = ["2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default"]
# Create a confidential client application
app = msal.ConfidentialClientApplication(
client_id=self.client_id,
authority=authority,
client_credential=self.client_secret,
)
# Acquire a token
result = app.acquire_token_for_client(scopes=scope)
if "access_token" in result:
access_token = result["access_token"]
return access_token
else:
raise RuntimeError("Error obtaining token:", result.get("error"))
class MicrosoftServicePrincipalAuthProvider(CredentialsProvider):
def __init__(self, client_id, client_secret, tenant_id):
self.client_id = client_id
self.client_secret = client_secret
self.tenant_id = tenant_id
def auth_type(self):
return "MSSP"
def __call__(self, *args, **kwargs):
token_source = MicrosoftServicePrincipalTokenSource(self.client_id, self.client_secret, self.tenant_id)
return token_source
connection = sql.connect(
server_hostname="...",
http_path="...",
credentials_provider=MicrosoftServicePrincipalAuthProvider(
client_id=os.environ["DATABRICKS_CLIENT_ID"],
client_secret=os.environ["DATABRICKS_CLIENT_SECRET"],
tenant_id=os.environ["DATABRICKS_TENANT_ID"],
),
)
cursor = connection.cursor()
cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())
Hi @kvedes Thanks for the detailed feedback. We're planning to add support for Azure Entra ID service principal authentication soon, including proper token refresh handling
Hi @shivam2680 Thanks for the reply. Do you have an expected time line for this feature?
We are yet to allot bandwidth for this. We aim to deliver this feature ASAP
@kvedes There is already an example that explain how to achieve it with databricks-sdk - https://github.com/databricks/databricks-sql-python/blob/main/examples/m2m_oauth.py cc @deeksha-db
@jprakash-db Thank you, I hadn't seen that one.