Not authenticated to use blobs outside of Azure blob container working directory when using Azure Entra
Related to #5448 and #5444 but both issues refer to using Fusion, this one refers to using azcopy.
They are likely to be solved by the same method, since they have the same underlying challenge: how to pass authentication to the worker node (Batch) from Nextflow.
I seem to be able to recreate the issue without Fusion.
> nextflow run seqeralabs/nf-canary -r main --remoteFile az://igenomes/atacseq_samplesheet_custom.csv --run TEST_STAGE_REMOTE -w az://scidev-useast -c azure.config
N E X T F L O W ~ version 24.10.3
NOTE: Your local project version looks outdated - a different revision is available in the remote repository [c818260035]
Launching `https://github.com/seqeralabs/nf-canary` [magical_noyce] DSL2 - revision: 2ad4214f51 [main]
Uploading local `bin` scripts folder to az://scidev-useast/tmp/cf/bcc6a54f6a9dd33780a5251d956439/bin
[69/6f65a5] Submitted process > NF_CANARY:TEST_STAGE_REMOTE (1)
ERROR ~ Error executing process > 'NF_CANARY:TEST_STAGE_REMOTE (1)'
Caused by:
Process `NF_CANARY:TEST_STAGE_REMOTE (1)` terminated with an error exit status (1)
Command executed:
cat atacseq_samplesheet_custom.csv
Command exit status:
1
Command output:
(empty)
Work dir:
az://scidev-useast/69/6f65a5549f7a3b2357312b12a28996
Container:
docker.io/library/ubuntu:23.10
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
azure.config:
process.executor = 'azurebatch'
fusion {
enabled = false
}
azure {
storage {
accountName = 'seqeralabs'
}
batch {
location = 'eastus'
accountName = 'seqeralabs'
copyToolInstallMode = 'node'
autoPoolMode = true
allowPoolCreation = true
deletePoolsOnCompletion = false
}
activeDirectory {
servicePrincipalId = 'redacted'
servicePrincipalSecret = 'redacted'
tenantId = 'redacted'
}
}
And with an access key:
To reiterate what's been said above, the error appears to stem from generateContainerSasWithActiveDirectory, which is only generating a relevant key for the working container and nothing else. Generating an account level SAS seems tricky (according to @alberto-miranda).
Originally posted by @adamrtalbot in https://github.com/nextflow-io/nextflow/issues/5444#issuecomment-2590156438
@alberto-miranda here is a method we could tell nextflow to pass the details to the worker task, this could help with #5444 and #5448.
It's pretty crude right now.
@alberto-miranda here is a method we could tell nextflow to pass the details to the worker task, this could help with #5444 and #5448.
It's pretty crude right now.
Apologies for the delay, but it is great that we are finally moving forward with this 😄. I'm happy to support for this in the Fusion side of things, so let's sync!
I wrote a couple PRs to support authenticating with Managed Identities with fusion v2.4 and the upcoming v2.5 and system-wide Managed Identities work out of the box (user-assigned require a single environment variable to be injected into worker nodes). So, we should be set if we can make Nextflow:
- Automatically assign a system-wide MI to pool nodes; or
- Automatically assign a user-assigned MI to pool nodes and inject an environment variable
(I personally prefer option 1)
Likely both should be supported
I'm not sure option 1 is supported by Azure Batch.
Option 2 is implemented as https://github.com/nextflow-io/nextflow/pull/5670
I think 1 means, nextflow creates a MI automatically and use it if it's not provided by the user
I interpreted it as system-assigned vs user-assigned: https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview#managed-identity-types
@alberto-miranda is this what you meant?
Fusion should support both anyway.
I interpreted it as system-assigned vs user-assigned: https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview#managed-identity-types
@alberto-miranda is this what you meant?
Yeah exactly, the naming is fairly misleading. The major difference for us is that for a System-wide MI we don't need any extra info in the worker nodes, whereas for a User-assigned MI we would need a Client ID or Resource ID to validate against.
As far as I understand both types can be configured by users in Azure and, if they do so, Nextflow already has a mechanism to let them choose one or the other in nextflow.config (https://www.nextflow.io/docs/latest/azure.html#managed-identities). The only piece missing, I believe, would be to propagate this information to worker nodes (which is covered by Adam's effort in #5670).
It's a slightly different story if we want Nextflow to automatically create these MIs and attach them to nodes from a pool: in this case users would not provide anything in nextflow.config (besides maybe their wish to use MIs) and Nextflow would take care of everything behind the scenes.
Fusion should support both anyway.
Fusion will be ready to support both as soon as https://github.com/seqeralabs/fusion/pull/716 and https://github.com/seqeralabs/fusion/pull/718 are merged.
The only piece missing, I believe, would be to propagate this information to worker nodes (which is covered by Adam's effort in https://github.com/nextflow-io/nextflow/pull/5670).
The worker nodes do not support system assigned identity, just user assigned. I believe @swampie couldn't work out how to attach a managed identity to a node pool programatically :(
Gpt:
Azure Batch worker nodes can utilize managed identities, but they support only user-assigned managed identities, not system-assigned ones. This means you need to create a user-assigned managed identity and associate it with your Batch pool to enable your compute nodes to securely access other Azure resources without managing credentials.
The system-assigned managed identity created for a Batch account is intended solely for accessing Azure Key Vault for customer-managed keys and is not available on compute nodes
Gpt:
Azure Batch worker nodes can utilize managed identities, but they support only user-assigned managed identities, not system-assigned ones. This means you need to create a user-assigned managed identity and associate it with your Batch pool to enable your compute nodes to securely access other Azure resources without managing credentials.
The system-assigned managed identity created for a Batch account is intended solely for accessing Azure Key Vault for customer-managed keys and is not available on compute nodes
From the Java SDK for Azure it appears to be possible to create a User-assigned MI with new UserAssignedIdentities() and assign it to a pool with BatchAccount.DefinitionStages.WithIdentity.
EDIT: I asked Claude.ai and the response I got was similar (not verified/ not tested):
@Grab(group='com.microsoft.azure', module='azure-batch', version='9.0.0')
@Grab(group='com.microsoft.azure', module='azure-identity', version='1.2.5')
@Grab(group='com.microsoft.azure', module='azure-core', version='1.14.0')
import com.microsoft.azure.batch.*
import com.microsoft.azure.batch.auth.*
import com.microsoft.azure.batch.protocol.models.*
import com.azure.core.management.profile.AzureProfile
import com.azure.identity.DefaultAzureCredentialBuilder
def configureBatchPoolWithManagedIdentity() {
// Azure Batch account details
def batchAccountName = "your-batch-account"
def batchAccountKey = "your-batch-account-key"
def batchAccountUrl = "https://${batchAccountName}.${region}.batch.azure.com"
// Create batch credentials
def credentials = new BatchSharedKeyCredentials(
batchAccountUrl,
batchAccountName,
batchAccountKey
)
// Create batch client
def batchClient = BatchClient.open(credentials)
// User-assigned managed identity details
def userAssignedIdentityId = "/subscriptions/<subscription-id>/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<identity-name>"
// Create identity reference
def identityReference = new UserAssignedIdentities()
identityReference.resourceId = userAssignedIdentityId
// Create pool identity configuration
def poolIdentityConfig = new PoolIdentityConfiguration()
.withType(PoolIdentityType.USER_ASSIGNED)
.withUserAssignedIdentities([
(userAssignedIdentityId): identityReference
])
// Create pool specification
def poolSpec = new PoolAddParameter()
.withId("pool-with-managed-identity")
.withVmSize("Standard_D2s_v3")
.withTargetDedicatedNodes(2)
.withIdentity(poolIdentityConfig)
// Configure the pool's virtual machine configuration
.withVirtualMachineConfiguration(
new VirtualMachineConfiguration()
.withImageReference(
new ImageReference()
.withPublisher("microsoft-azure-batch")
.withOffer("ubuntu-server-container")
.withSku("20-04-lts")
.withVersion("latest")
)
.withNodeAgentSkuId("batch.node.ubuntu 20.04")
)
try {
// Create the pool
batchClient.poolOperations().createPool(poolSpec)
println "Successfully created pool with managed identity"
} catch (BatchErrorException e) {
println "Error creating pool: ${e.getMessage()}"
} finally {
batchClient.close()
}
}
// Execute the configuration
configureBatchPoolWithManagedIdentity()
Resolved by #6118
@bentsherman I've tested it without Fusion using latest Edge version, and it still doesn't seem to work. Moreover, it still seems to require azure.storage.accountKey when I use it like so:
azure {
managedIdentity {
clientId = azure_config["userAssignedManagedIdentityClientId"]
}
storage {
accountName = azure_config["storageAccountName"]
}
batch {
location = 'eastus'
accountName = azure_config["batchAccountName"]
poolIdentityClientId = azure_config["userAssignedManagedIdentityClientId"]
allowPoolCreation = true
deleteJobsOnCompletion = true
copyToolInstallMode = 'node'
pools {
test_private_mount {
vmType = 'Standard_D2as_v4'
virtualNetwork = azure_config["virtualNetwork"]
}
}
}
}
Looking at the code in #6118 , it seems like that applies to Fusion (https://github.com/nextflow-io/nextflow/blob/1a4c3987a2b09e02bedaa1b7da1d80f65efcaaea/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L561).
Given that, was this closed incorrectly perhaps? Thank you
Correct, this is Fusion only so this issue shouldn't be closed.
Hi,
I wanted to point out that this issue will be more prevalent now that you need to use ENTRA because Low Priority VMs are going to be deprecated and you need to switch to User Subscription Batch Accounts.
We have been using keys until now and we have been able to use multiple containers, but now Nextflow creates a SAS key for the working directory and try to use this for all the containers... which does not work.
Anybody that needs to migrate will have this issue.
https://learn.microsoft.com/en-us/azure/batch/batch-spot-vms
@luanjot yes, this has impacted us after the move over. Our IT security policy means we are not able to use storage account access keys.