nextflow
nextflow copied to clipboard
Allow identity based authentication on Azure Batch
This PR addresses the credential-less authentication mechanism against Azure Batch and related services.
@pditommaso , I'd appreciate your input on design decisions here.
I have been studying the authentication mechanisms which we can provide as an alternative to the use of storage/batch account keys, and I zeroed in on the service principal based setup, which essentially
- Requires the creation of a service principal
- Relies on authentication creds (client-id, client-secret, tenant-id)
- Can easily be integrated into the current codebase
- Minimal changes to an NF-generated pool
However, the downside is still that the authentication creds still would need to be available within the task environment for azcopy
to login as a service principal via azcopy --service-principal -application-id $CLIENT_ID --tenant $TENANT_ID
with AZCOPY_SPA_CLIENT_SECRET
available in the env.
Therefore, the effective change via this mechanism is that we would authenticate the per-node/task azcopy
CLI for uploads, without storage keys/SAS token but we'd still rely upon the authentication creds (client-id, client-secret, tenant-id) to be made available for every azcopy
invocation.
However, I think the central piece which needs to be addressed is basically authenticating azcopy
running per-node/task in the batch service and there are basically three modes of authentication which we can explore
- Service Principal
- User-assigned Identity
- Managed Identity
The user-assigned identity requires certain metadata to be made available to an Azure Resource such as client ID, Object ID, or Resource ID
but beyond this, is effectively the same as Managed Identity (managed indicates, managed by Azure platform).
I'm currently working on the Managed Identity and evaluating any corner cases
- Which would only require us to do an
azcopy login --managed
within.command.run
- Requires certain properties to be enabled in the Batch account (such as
Identity = SystemAssigned
andStorage account authentication = "BatchAccountManagedIdentity"
- Requires the Managed Identity to have sufficient permissions
Managed Identity can be used with all major services which makes sense in Nextflow (and Tower) context such as
- Monitoring
- Logging
- Managed Disk
- Azure ARC Kubernetes
- Key vault
Despite the expectations with the Batch account setup, this direction via User-assigned / Managed Identity seems security friendly - what do you think?
However, the downside is still that the authentication creds still would need to be available within the task environment for azcopy to login as a service principal via azcopy --service-principal -application-id $CLIENT_ID --tenant $TENANT_ID with AZCOPY_SPA_CLIENT_SECRET available in the env.
Ouch this is ugly. The problem is also the credentials for the Blob storage on Nextflow side which now requires the account key or SAS token
https://github.com/nextflow-io/nextflow/blob/6f087fecade0293978af096c465dce9ea1e75f84/plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileSystemProvider.groovy#L202-L203
I think the best way to go for this would be to generate a SAS token when the pipeline is launched. This should also solve the problem for the azcopy
copy tool. have you taken into consideration this possibility?
I think the best way to go for this would be to generate a SAS token when the pipeline is launched. This should also solve the problem for the azcopy copy tool. have you taken into consideration this possibility?
Paolo, this is another possibility yes. The tradeoff would be that we get in the business of managing SAS tokens during a pipeline run cycle and the mechanism would not be suitable in case the SAS tokens are not enabled (which is a recommended best practice), therefore we might inevitably run into the problem of using credential-less validation.
I've had good success with the managed identity (credential less) solution which relies on the fact that any VM in the Batch node is an Azure resource and can natively authenticate with the Azure AD (if configured as such) - I feel I'm close 🤞 😊
Bingo!
abhinav@0239xxxxxxx2684bb1xxxxxxxx000:/mnt/batch/tasks/shared/bin$ ./azcopy login --identity
INFO: azcopy: A newer version 10.16.0 is available to download
INFO: Login with identity succeeded.
INFO: azcopy: A newer version 10.16.0 is available to download
And azcopy login --identity
is all we'd need to do in the .command.run
script to allow any interaction with the Blob storage 🤝
What about Blob storage client run by nextflow itself?
What about Blob storage client run by nextflow itself?
Do you mean the head-job? If so, then we need to accommodate this library https://github.com/AzureAD/microsoft-authentication-library-for-java within the codebase to allow the head job to source the authentication from the VM it's running upon - I'm evaluating this now for our use cases.
Yes, but it looks to me that it requires too much customization that's hard to test.
Above all, once Batch is authenticated, nextflow is able to automatically create a SAS token for the Blob storage.
https://github.com/nextflow-io/nextflow/blob/27636f9d27c56d286e1e6173e66d97c971c5d6d3/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchExecutor.groovy#L97-L98
Therefore keep the blob storage auth on the SAS token and focus on finalise the Batch authentication
Ouch that was not accurate, the sas token is created by using the storage client
https://github.com/nextflow-io/nextflow/blob/27636f9d27c56d286e1e6173e66d97c971c5d6d3/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzHelper.groovy#L64-L64
that's create with the storage keys :/
https://github.com/nextflow-io/nextflow/blob/27636f9d27c56d286e1e6173e66d97c971c5d6d3/plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileSystemProvider.groovy#L205-L208
@pditommaso , yes that's why I'm trying find a user-friendly and credential-less auth way to allow users to run Nextflow
Case-1. Entirely within the Azure infra (by relying upon Identity) - this path is straightforward now. Case-2. Hybrid mode, with the head job not running on Azure infra ↔️ this is where I'm iterating a bit
UPDATE:
@pditommaso , how about the following scenarios regarding Case-2
-
The service principal is used for authentication and initiating the local <-> Azure access
-
If the batch pool is configured for managed-id, then rely upon the managed identity (and we don't rely upon any credentials there)
-
Otherwise rely upon the service principal credential for the any authentication.
Managing and rotating the service principal and associate creds is much more scalable than relying upon the Batch/Storage keys.
Also, it is likely that a user who has to interact with the Blob service (via azcopy
) would already have a service-principal due to the design of the tool for data transfers.
What do you think?
@pditommaso @jorgeaguileraseqera , this PR is ready to be reviewed from my side - I'd appreciate any feedback please 🙏
-
Service principal should have the following permissions
-
Contributor
-
Storage Blob Data Contributor
-
Storage Blob Data Owner
-
-
Config level changes
-
azure.activeDirectory
config - storage and batch
accountKey
are no longer required
-
workDir = "az://rnaseq-nf/rnaseq-nf-workdir"
process {
executor = 'azurebatch'
container = 'quay.io/nextflow/rnaseq-nf:latest'
}
params {
outdir = "az://rnaseq-nf/rnaseq-nf-publishdir"
}
process.executor = 'azurebatch'
process.queue = "$AZURE_BATCH_POOL_NAME"
azure {
activeDirectory {
servicePrincipalId = "bXXXbXXX-6XXc-X2xx-8xx4-f02e6xxxxxc" // OR "$AZURE_CLIENT_ID"
servicePrincipalSecret = "BXXQ~UuxxxO-kXXXXX1yjG5agXXXXXLweXXXX" // OR "$AZURE_CLIENT_SECRET"
tenantId = "XX51xxxx-XXXX-4XX8-XXXX-1XXXXXXeb" // OR "$AZURE_TENANT_ID"
}
storage {
accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
}
batch {
location = "westeurope"
accountName = "$AZURE_BATCH_ACOUNT_NAME"
deleteJobsOnCompletion = false
pools {
"$AZURE_BATCH_POOL_NAME" {
vmType = 'STANDARD_D4_V3'
privileged = false
}
}
}
}
- Changes to log entries Logs regarding creation of Blob and Batch clients using Azure Active Directory Identity
...
Sept-14 07:52:53.852 [main] DEBUG n.c.azure.nio.AzFileSystemProvider - Creating Azure Blob storage client using Service Principal credentials
...
Sept-14 07:53:03.274 [Thread-5] INFO c.a.identity.ClientSecretCredential - Azure Identity => getToken() result for scopes [https://storage.azure.com/.default]: SUCCESS
Sept-14 07:53:03.276 [Thread-5] INFO c.a.c.i.AccessTokenCache - Acquired a new access token.
...
Sept-14 07:53:13.175 [Task submitter] DEBUG n.cloud.azure.batch.AzBatchService - [AZURE BATCH] Creating Azure Batch client using Service Principal credentials
Tested scenarios
Case-1: Same blob container for work/results ✅
Case-2: Different blob container for work/results ✅
Improvements (TBD)
- Disable logging from the underlying library
Sept-14 07:53:14.694 [reactor-http-kqueue-3] WARN c.a.c.u.s.SerializerEncoding - 'Content-Type' not found. Returning default encoding: JSON
I don't know these APIs very well but I didn't see any issues with the design.
@pditommaso , I've added the doc entries in the PR now.
:warning: 7 God Classes were detected by Lift in this project. Visit the Lift web console for more details.
@pditommaso , shall we take this merge forward?
Let's merge this 🚀