databricks-sdk-go
databricks-sdk-go copied to clipboard
Unable to use Workflow Identity Federation from Azure DevOps pipeline
Describe the issue
I want to deploy a Databricks Asset Bundle from an Azure Pipeline using databricks cli. While authentication for the cli itself seems to work, the actual deployment does not. It appears that the underlying Terraform provider is not able to authenticate.
The issue in particular appears to arise from our DevOps service connection. The service connection is configured for Workload Identity Federation. When I try an old service connection that authenticates using client credentials, the deployment succeeds.
I suspect the bug may be fixed by simply upgrading the version of Terraform that databricks cli uses under the hood. Currently it uses Terraform 1.5.5. Newer versions of Terraform seems to support the Workload Identity Federation flow. See https://developer.hashicorp.com/terraform/language/settings/backends/azurerm, but note how version 1.5.x of that same page makes no mention of Workload Identity Federation.
Relevant documentation:
- https://devblogs.microsoft.com/devops/introduction-to-azure-devops-workload-identity-federation-oidc-with-terraform/
- https://devblogs.microsoft.com/devops/public-preview-of-workload-identity-federation-for-azure-pipelines/
- https://developer.hashicorp.com/terraform/language/v1.5.x/settings/backends/azurerm
Configuration
# azure-pipelines.yml
variables:
databricksHost: "https://adb-XXX.azuredatabricks.net"
pool:
vmImage: "ubuntu-latest"
jobs:
- job: databricks_asset_bundle
displayName: "Deploy Databricks Asset Bundle"
steps:
- bash: |
# Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
# Verify installation
databricks --version
# Create databricks config file
file="~/.databrickscfg"
if [ -f "$file" ] ; then
rm "$file"
fi
echo "[DEFAULT]" >> ~/.databrickscfg
echo "host = $databricksHost" >> ~/.databrickscfg
displayName: Setup Databricks CLI
- task: AzureCLI@2
displayName: Deploy Asset Bundle
inputs:
azureSubscription: "my-workload-identity-federation-service-connection"
addSpnToEnvironment: true
scriptType: "bash"
scriptLocation: "inlineScript"
inlineScript: |
# As described in https://devblogs.microsoft.com/devops/public-preview-of-workload-identity-federation-for-azure-pipelines/
export ARM_CLIENT_ID=$servicePrincipalId
export ARM_OIDC_TOKEN=$idToken
export ARM_TENANT_ID=$tenantId
export ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
export ARM_USE_OIDC=true
# Databricks authentication itself works fine
echo ------------- List experiments -------------
databricks experiments list-experiments
# But bundle deployment does not
echo ------------- Deploy bundle -------------
databricks bundle deploy --log-level=debug --target dev
I have tried various combinations of the ARM_ environment variables above, but I couldn't find a working combination.
What did work was using a service principal service connection, in combination with:
addSpnToEnvironment: true
inlineScript: |
export ARM_CLIENT_ID=$servicePrincipalId
export ARM_TENANT_ID=$tenantId
export ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
export ARM_CLIENT_SECRET=$servicePrincipalKey
# databricks.yml
bundle:
name: my_project
variables:
service_principle:
description: Service principle used by the DevOps agent
default: my-service-principle-id
run_as:
service_principal_name: ${var.service_principle}
# Example resources to deploy
resources:
experiments:
my_experiment:
name: "/Workspace/Users/${var.service_principle}/my_experiment"
targets:
dev:
mode: production
default: true
workspace:
host: https://adb-XXX.azuredatabricks.net
Steps to reproduce the behavior
- Create a DevOps service connection with Workflow Identity Federation
- Create an Azure Pipeline with above yml (replace placeholders), using the service connection from 1)
- Create Databricks Asset Bundle with above above yml (replace placeholders)
- Trigger pipeline
- Observe error
Expected Behavior
The deployment of the asset bundle should succeed.
Actual Behavior
The following error appears in the pipeline's log:
------------- Deploy bundle -------------
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/index.json
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS.72D7468F.sig
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_linux_amd64.zip
Uploading bundle files to /Users/***/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
Error: terraform apply: exit status 1
Error: cannot create mlflow experiment: failed during request visitor: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account.
. Config: host=https://adb-XXX.azuredatabricks.net,/ azure_client_id=***, azure_tenant_id=XXX. Env: DATABRICKS_HOST, ARM_CLIENT_ID, ARM_TENANT_ID
with databricks_mlflow_experiment.main,
on bundle.tf.json line 17, in resource.databricks_mlflow_experiment.main:
17: }
Note that the listing of experiments works fine:
------------- List experiments -------------
[
(expected list of experiments, redacted)
{
...
},
...
]
OS and CLI version
Output by the Azure pipeline:
azure-cli 2.63.0
core 2.63.0
telemetry 1.1.0
Extensions:
azure-devops 1.0.1
Dependencies:
msal 1.30.0
azure-mgmt-resource 23.1.1
Databricks CLI: v0.227.0
OS: Ubuntu (Microsoft-hosted agent, latest version)
Is this a regression?
I don't know, I'm new to Databricks.
Debug Logs
See attachment. debug_logs.txt