bicep
bicep copied to clipboard
Add "wait" and "retry" deployment options
ARM template deployment often fails with errors like:
"Another operation is in progress on the selected item. If there is an in-progress operation, please retry after it has finished."
"BMSUserErrorObjectLocked","message":"Another operation is in progress on the selected item."
Just to clarity - this is not a dependency issue. ARM deployment may fail if ,for example, you try to add a VM to an RSV and there is another VM being added at the same time: for a few seconds RSV will not accept new clients and as the result your deployment will fail.
Would like to have an option to pause deployment and/or retry it - may be introduce the "wait" and "retry" deployment conditions, i.e:
resource blob 'Microsoft.Storage/storageAccounts/blobServices/containers@2019-06-01' = {
wait: 30
retry: 5
name: '${stg.name}/default/logs'
}
Understood. This is something we have been considering, but haven't scheduled the work yet. If you (or others) have other examples that you have run into, it would be great to capture those here.
I know RBAC replication (and replication delays in general) are another place where something like this would be helpful.
I know RBAC replication (and replication delays in general) are another place where something like this would be helpful.
@alex-frankel I'm assuming this is something we're planning on also addressing in the underlying platform? This feels like a leaky abstraction, not something that the end-user should have to deal with by adding delays.
This feels like a leaky abstraction, not something that the end-user should have to deal with by adding delays.
Agreed. @bmoore-msft and I were also discussing this yesterday. Ideally, ARM will co-locate all the calls end-to-end so a user never has to think about this. Not sure if/when that will be possible, and this may be a necessary evil in the meantime.
The OP doesn't sound like replication (feels like concurrency) though I could see that you could potentially address both with something like retry. The problem in this case (or either really case) is indefinite postponement. This feels like a problem with the RP - common operations returning frequent 400s instead of maybe 429.
The challenge with this workaround is not only does the user have to fail, then implement a non-deterministic work around (that's expensive on the service) it will mask problems with across ARM, RPs and user code.
@rshariy - have you raised this issue with the RSV team? It doesn't appear to be an uncommon problem and seems like it should be addressed by the RSV... either it shouldn't happen or we're not helping customer figure out how to effectively use RSV.
@bmoore-msft I raised a similar issue with the Azure Firewall product team about a year ago - the only solution we found is to use a PowerShell function to check Azure FW status (make sure it is not "updating") before kicking-off new ARM deployment to FW.
Just logged ticket 120120226003381 about the RSV issue - lets see what MS support will come up with.
it will mask problems with across ARM, RPs and user code.
this point is what gives us caution on implementing something like this. We have some potential solutions to deal with the replication delay in particular that we will explore before introducing a wait.
@rshariy - please let us know the resolution of the case.
I have a main template that looks like this:
module kv 'keyvault.bicep' = {
name: 'kvSmoketestDeploy'
scope: rg
params: {
keyVaultName: keyVaultName
enableSoftDelete: false
}
}
module kvaccpol 'keyvaultaccesspolicy.bicep' = {
name: 'kvAccPolSmoketestDeploy'
scope: rg
params: {
keyVaultName: keyVaultName
action: 'add'
objectId: objectId
access: keyVaultAccessPolicyAccess
}
}
When that runs, the deployment breaks with:
{
"error": {
"code": "ParentResourceNotFound",
"message": "Can not perform requested operation on nested resource. Parent resource 'kv-kvaccpoltest' not found."
}
} (Code:NotFound)
Running the deployment again, deploys the policy
I ran into a scenario where I'd like a wait, not much code to show, basically deploying a FunctionApp, then want to output the default key for use in Api Management. The problem is the function app takes some time to spin up before the app keys are present...
resource functionApp 'Microsoft.Web/sites@2020-06-01' = {
name: functionAppName
location: location
kind: 'functionapp'
...
output functionappdefaultkey string = listKeys('${functionApp.id}/host/default', functionApp.apiVersion).functionKeys.default
Workaround is to run the initial deployment of the function app twice.
@eja-git this isn't a "wait" scenario, it's bug in the deployment engine job scheduling... the listKeys job is scheduled too early... so that's the fix for your particular scenario.
Hi,
I've logged the following issue https://github.com/projectkudu/kudu/issues/3312#issuecomment-870741730 that could also benefit from the wait option during a deployment.
Best Regards Pieter
I am trying to simplify firewall rule collection deploying by using loadTextContent and then loop from each variable. workload-x.json contains all properties for rule collection.
var workloads = [
json(loadTextContent('./workload-1.json'))
json(loadTextContent('./workload-2.json'))
json(loadTextContent('./workload-3.json'))
]
resource afwPolicy 'Microsoft.Network/firewallPolicies@2021-02-01' existing = {
name: 'bicepRules'
}
resource collectionGroups 'Microsoft.Network/firewallPolicies/ruleCollectionGroups@2021-02-01' = [for workload in workloads: {
name: workload.name
parent: afwPolicy
properties: workload.properties
}]
here is the error I get
Rule Collection Group workload-2 can not be updated because Parent Firewall Policy bicepRules is in Updating state from previous operation
I am sure that a short delay between deployments would help us to loop through all array
Only one Rule Collection Group can be updated at a time with Azure Firewall Policy. Since the update refreshes all of the connected Azure Firewall instances, the amount of time it takes to update is non-deterministic. Therefore you will need to serialize the deployment using the batchSize decorator.
Can you try:
@batchSize(1)
resource collectionGroups 'Microsoft.Network/firewallPolicies/ruleCollectionGroups@2021-02-01' = [for workload in workloads: {
name: workload.name
parent: afwPolicy
properties: workload.properties
}]
I have two scenarios that come to mind from recent experience.
Overarching enterprise management level policy being applied to a resource that has been created which I reference in next resource/module causing the Another Operation error. A retry would be useful here as I have no control or influence over the Policies.
I have also faced situations where a newly created resource is not available when referenced immediately afterwards which I assume is a replication/caching issue as the next run works flawlessly.
My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes. In this case I am unable to use the resource output to set the connection string for use in subsequent modules e.g. passing into keyVault and functionAppSettings
My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes.
@markjbrown - do you mind taking a look at this one? I'd expect the Cosmos Account not to report complete until it is fully provisioned. @zapadoody -- do you happen to have the code sample of the repro and a correlation ID when the error occured?
For run-time deployment errors you should raise a support ticket as they are best equipped to diagnose specific errors with an activity id.
However I am happy to look at an existing bicep file though to see if there are any issues.
I do have a sample on how to output the endpoint and key from a Cosmos account and input into appSettings for an App Service here if that helps.
https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.documentdb/cosmosdb-webapp/main.bicep
here's my cosmosAccount.bicep
param location string
param cosmosAccountName string
param cosmosDefaultConsistencyPolicy string
param cosmosPrimaryRegion string
param cosmosSecondaryRegion string
var lowerCosmosAcctName = toLower(cosmosAccountName)
var locations = [
{
locationName: cosmosPrimaryRegion
failoverPriority: 0
isZoneRedundant: false
}
{
locationName: cosmosSecondaryRegion
failoverPriority: 1
isZoneRedundant: false
}
]
resource cosmosAccountResource 'Microsoft.DocumentDB/databaseAccounts@2021-06-15' = {
name: lowerCosmosAcctName
kind: 'GlobalDocumentDB'
location: location
properties: {
locations: locations
databaseAccountOfferType: 'Standard'
enableAutomaticFailover: true
consistencyPolicy: {
defaultConsistencyLevel: cosmosDefaultConsistencyPolicy
}
}
}
output cosmosAccountResourceName string = cosmosAccountResource.name
here's the KeyVault.bicep
param location string
param keyVaultName string
param productionPrincipalId string
param productionTenantId string
param stagingPrincipalId string
param stagingTenantId string
@secure()
param cosmosPrimaryConnectionString string
@secure()
param cosmosSecondaryConnectionString string
@secure()
param serviceStorageConnectionString string
@secure()
param appStorageConnectionString string
resource keyVault 'Microsoft.KeyVault/vaults@2019-09-01' = {
name: keyVaultName
location: location
properties: {
enabledForDeployment: true
enabledForTemplateDeployment: true
enabledForDiskEncryption: true
tenantId: productionTenantId
accessPolicies: [
{
tenantId: productionTenantId
objectId: productionPrincipalId
permissions: {
secrets: [
'get'
'list'
]
}
}
{
tenantId: stagingTenantId
objectId: stagingPrincipalId
permissions: {
secrets: [
'get'
'list'
]
}
}
]
sku: {
name: 'standard'
family: 'A'
}
}
}
resource cosmosPrimaryConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
name: '${keyVaultName}/cosmosPrimaryConnectionString'
properties: {
value: cosmosPrimaryConnectionString
}
dependsOn:[
keyVault
]
}
resource cosmosSecondaryConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
name: '${keyVaultName}/cosmosSecondaryConnectionString'
properties: {
value: cosmosSecondaryConnectionString
}
dependsOn:[
keyVault
]
}
resource serviceStorageConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
name: '${keyVaultName}/dbConnectionString'
properties: {
value: serviceStorageConnectionString
}
dependsOn:[
keyVault
]
}
resource appStorageConnectionStringSecret 'Microsoft.KeyVault/vaults/secrets@2019-09-01' = {
name: '${keyVaultName}/appStorageConnectionString'
properties: {
value: appStorageConnectionString
}
dependsOn:[
keyVault
]
}
output appStorageConnectionStringUri string = appStorageConnectionStringSecret.properties.secretUri
output serviceStorageConnectionStringUri string = serviceStorageConnectionStringSecret.properties.secretUri
output cosmosPrimaryConnectionStringUri string = cosmosPrimaryConnectionStringSecret.properties.secretUri
output cosmosSecondaryConnectionStringUri string = cosmosSecondaryConnectionStringSecret.properties.secretUri
and here's the main.bicep
/// cosmos db account, database and container module
module cosmosAccountMod '../cosmosAccount.bicep' = {
name: 'cosmosAccount-${environmentName}-${buildNumber}'
params: {
cosmosAccountName: cosmosAccountName
cosmosDefaultConsistencyPolicy: cosmosDefaultConsistencyPolicy
cosmosPrimaryRegion: cosmosPrimaryRegion
cosmosSecondaryRegion: cosmosSecondaryRegion
location: location
}
}
module cosmosDatabaseMod '../cosmosDbContainer.bicep' = {
name: 'cosmosDBContainer-${environmentName}-${buildNumber}'
params: {
cosmosAccountName: cosmosAccountMod.outputs.cosmosAccountResourceName
cosmosContainerName: cosmosContainerName
cosmosDatabaseName: cosmosDatabaseName
cosmosThroughput: cosmosThroughput
}
dependsOn: [
cosmosAccountMod
]
}
// storage account module - storage for the tenants application
module appStorageAccountMod '../storageAccount.bicep' = {
name: 'appStorageAcctName-${environmentName}-${buildNumber}'
params: {
storageAcctName: appStorageAcctName
storageSkuName: appStorageAcctSku
location: location
}
}
// app insights module
module appInsightsMod '../appInsights.bicep' = {
name: 'appInsightsName-${environmentName}-${buildNumber}'
params: {
name: appInsightsName
resourceGroupLocation: location
}
}
// app service plan module
module appServicePlanMod '../appServicePlan.bicep' = {
name: 'appServicePlan-${environmentName}-${buildNumber}'
params: {
appSvcPlanSku: appSvcPlanSku
appSvcPlanTier: appSvcPlanTier
appSvcPlanName: appSvcPlanName
appPlanLocation: location
}
}
// function app module
module functionAppMod '../functionApp.bicep' = {
name: 'functionApp-${environmentName}-${buildNumber}'
params: {
appSvcPlanName: appSvcPlanName
functionAppName: functionAppName
location: location
}
dependsOn: [
appStorageAccountMod
appServicePlanMod
cosmosAccountMod
]
}
// service storage account module - storage for the function app
module serviceStorageAccountMod '../storageAccount.bicep' = {
name: 'serviceStorageAcctName-${environmentName}-${buildNumber}'
params: {
storageAcctName: serviceStorageAcctName
storageSkuName: serviceStorageAcctSku
location: location
}
}
// key vault module
module keyVaultMod '../keyVault.bicep' = {
name: 'keyVaultName-${environmentName}-${buildNumber}'
params: {
keyVaultName: keyVaultName
location: location
cosmosPrimaryConnectionString: listConnectionStrings(resourceId('Microsoft.DocumentDB/databaseAccounts', cosmosAccountName), '2020-04-01').connectionStrings[0].connectionString
cosmosSecondaryConnectionString: listConnectionStrings(resourceId('Microsoft.DocumentDB/databaseAccounts', cosmosAccountName), '2020-04-01').connectionStrings[1].connectionString
productionPrincipalId: functionAppMod.outputs.productionPrincipalId
productionTenantId: functionAppMod.outputs.productionTenantId
stagingPrincipalId: functionAppMod.outputs.stagingPrincipalId
stagingTenantId: functionAppMod.outputs.stagingTenantId
serviceStorageConnectionString: serviceStorageAccountMod.outputs.storageAccountConnectionString
appStorageConnectionString: appStorageAccountMod.outputs.storageAccountConnectionString
}
dependsOn:[
functionAppMod
cosmosAccountMod
cosmosDatabaseMod
]
}
// function app settings module
module functionAppSettingMod '../functionAppSettings.bicep' = {
name: 'functionAppSettings-${environmentName}-${buildNumber}'
params: {
appInsightsKey: appInsightsMod.outputs.appInsightsKey
cosmosConnectionStringUri: keyVaultMod.outputs.cosmosPrimaryConnectionStringUri
appStorageConnectionStringUri: keyVaultMod.outputs.appStorageConnectionStringUri
serviceStorageConnectionStringUri: keyVaultMod.outputs.serviceStorageConnectionStringUri
functionAppName: functionAppMod.outputs.prodSlotFunctionAppName
functionAppStagingName: functionAppMod.outputs.stagingSlotFunctionAppName
}
dependsOn:[
functionAppMod
appInsightsMod
cosmosAccountMod
keyVaultMod
]
}
Also to clarify previously I was using the output in the cosmosAccount.bicep but changed to the query approach to try ad get away from the error. Thanks for the tip on raising the support ticket.
For run-time deployment errors you should raise a support ticket as they are best equipped to diagnose specific errors with an activity id.
However I am happy to look at an existing bicep file though to see if there are any issues.
I do have a sample on how to output the endpoint and key from a Cosmos account and input into appSettings for an App Service here if that helps.
https://github.com/Azure/azure-quickstart-templates/blob/master/quickstarts/microsoft.documentdb/cosmosdb-webapp/main.bicep
@alex-frankel Can you take a look at that? It seems the dependsOn is being fulfilled with the ack of the started and/or accepted responses rather than succeeded
My scenario includes creating a Cosmos Account, this typically takes a few minutes and sometimes up to 10 minutes.
@markjbrown - do you mind taking a look at this one? I'd expect the Cosmos Account not to report complete until it is fully provisioned. @zapadoody -- do you happen to have the code sample of the repro and a correlation ID when the error occured?
@alex-frankel any thoughts on the bicep here? Also I have opened a support case for this if you need that ref # let me know and I can send direct.
The problem is this listConnectionStrings function. I've never seen it before. I tried testing in an ARM template and it doesn't work (not sure why the template didn't fail validation).
If you want to output the endpoint and keys use this syntax below. To make it as a connection string just concat them together with "AccountEndpoint=" and ";AccountKey="
"[reference(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName'))).documentEndpoint]" "[listKeys(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName')), '2021-04-15').primaryMasterKey]"
The problem is this listConnectionStrings function. I've never seen it before. I tried testing in an ARM template and it doesn't work (not sure why the template didn't fail validation).
If you want to output the endpoint and keys use this syntax below. To make it as a connection string just concat them together with "AccountEndpoint=" and ";AccountKey="
"[reference(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName'))).documentEndpoint]" "[listKeys(resourceId('Microsoft.DocumentDB/databaseAccounts', variables('cosmosAccountName')), '2021-04-15').primaryMasterKey]"
@markjbrown apologies thank you for the assistance!!!
@zapadoody did this resolve your issue now?
I think the most obvious reason why we need this is when you assign a role to an identity with: Microsoft.Authorization/roleAssignments and then do something with the role and identity in the same template, like with Microsoft.Resources/deploymentScripts for instance, or using something from a keyvault which it just got permissions from. This is not really nice to work with right now as it's almost guaranteed to fail at the first deployment, when the permissions are not set yet.
I think the most obvious reason why we need this is when you assign a role to an identity with:
Microsoft.Authorization/roleAssignmentsand then do something with the role and identity in the same template, like withMicrosoft.Resources/deploymentScriptsfor instance, or using something from a keyvault which it just got permissions from. This is not really nice to work with right now as it's almost guaranteed to fail at the first deployment, when the permissions are not set yet.
at the role assignment template, try to set principalType to ServicePrincipal. It works like a charm in my environment.
I think the most obvious reason why we need this is when you assign a role to an identity with:
Microsoft.Authorization/roleAssignmentsand then do something with the role and identity in the same template, like withMicrosoft.Resources/deploymentScriptsfor instance, or using something from a keyvault which it just got permissions from. This is not really nice to work with right now as it's almost guaranteed to fail at the first deployment, when the permissions are not set yet.at the role assignment template, try to set principalType to
ServicePrincipal. It works like a charm in my environment.
Does that guarantee anything? Setting roles, even manually, does not guarantee instant assignment of a role, this is what Microsoft documented itself, see https://docs.microsoft.com/en-us/azure/role-based-access-control/troubleshooting#role-assignment-changes-are-not-being-detected. In worst cases it takes 30 minutes, and I've seen it take over 5 minutes myself. I'm not saying that you're wrong in your scenario, just saying that not all scenario's will be instant with RBAC assignments.
@erwinkramer is correct, there are 2 problems with replication in this RBAC scenario
- the MSI replicating through AAD/Azure so that a role can be assigned
- the roleAssignment replicating through Azure so it takes effect
The principalType property solves the first but not the second.
In worst cases it takes 30 minutes, and I've seen it take over 5 minutes myself. I'm not saying that you're wrong in your scenario, just saying that not all scenario's will be instant with RBAC assignments. This is the challenge with wait/retry in general... When do you know that you should and how long do you wait for? We've talked about something like "wait until I can GET this resource" but that still has replication and fanout issues...
We understand the pain, and there are some workarounds (e.g. serial deployment of resources) - the current guidance from leadership is to solve the root cause.
For policy as well ... When you create an initiative definition then an initiative assignment > Error > Wait a bit between both > succes
For policy as well ... When you create an initiative definition then an initiative assignment > Error > Wait a bit between both > succes
Azure CLI 'wait' command may be used to wait until resource provisioned with 'Succeeded' stage
az deployment mg create --name deploymentName
az deployment mg wait --name deploymentName --created --management-group-id mgmtName
To add a comment here, I'm not sure why are we trying to find workarounds for a situation the resource provider should address. If the resource provider doesn't support concurrent operations, then serializing should be fine. However, if there's a situation like, resource A returns the operation as complete, but it's still doing something (e.g.: replication) then why is the Resource Provider signaling ARM that the operation is completed and ready for any other operation?