bicep icon indicating copy to clipboard operation
bicep copied to clipboard

Check if resource exists

Open lordlinus opened this issue 3 years ago • 98 comments

feature request currently creating AKS cluster using Microsoft.MachineLearningServices/workspaces/computes is not idempotent and will error if the link already exists. Currently there is no convenient way to check if the resource already exists and would like to understand the best way to handle this scenario in Bicep

Docs Link: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-kubernetes?tabs=python#limitations

Describe the solution you'd like Convenient way to check if the resource exists and skip certain action ( in this case dont try to establish link) . This could be achieved using a deployment script az cli to see if the resource exists, but looks heavy for a simple check

lordlinus avatar Aug 13 '21 05:08 lordlinus

This would be a great feature, certain things (like KeyVault secrets) end up getting constantly overwritten with a 'new' value when things like an externally managed secret value are involved. I'd like to be able to create the secret if it's not there, and otherwise just leave it as-is. The 'existing' function doesn't seem to help here, as it just bombs out if you try to use it in this way, stating that the resource doesn't exist.

bengavin avatar Aug 23 '21 18:08 bengavin

In theory, due to the idempotent nature of ARM Template/Bicep deployments, you shouldn't need to know if it is new or existing. The AKS issue is definitely a service-side bug. I will send this to them.

@bengavin -- this should be true for the keyvault secret as well. As long as the secret value is the same, the update would be a NoOp. What issue is the update causing?

alex-frankel avatar Aug 24 '21 18:08 alex-frankel

@alex-frankel My specific issue is that I need to know the secret exists, I don't want to actually control the value of the secret in the bicep template. I've worked around this by just passing in the list of existing secrets into the template and conditioning the creation on the input parameter vs. the resource itself.

For background, these secrets are for things that likely need rotation by operations staff and they are referenced by web apps, function apps, etc. The goal of the template is to get the secrets setup, with 'tbd' values and linked with the appropriate resources to avoid typos and such. The operations folks can go in and supply the appropriate secret values into KeyVault and have them picked up automatically by the applications. Then, when code changes occur, those secret values are NOT overwritten with 'tbd' values again, since my desired state represents 'exists' vs. 'has this value'.

In my case, introducing another level of indirection by having the secrets looked up during deployment of the template (via KeyVault reference parameter) adds another 'step' into the rotation of secret values that feels unnecessary. I don't want my operations folks to need to update a 'deployment' key vault and then trigger a re-deployment to get the new secret value pushed into the application level key vault.

That said, if this is the wrong approach, I'd be happy to hear that and understand why :)

bengavin avatar Aug 24 '21 20:08 bengavin

+1 on this. I'd also like to have a function on ARM/bicep to check if particular resourceId exists.

My use case is that I'd like to create a blue-green deployment of container instance and manipulate private DNS entry to switch to opposite configuration after the deployment. the DNS entry would also indicate which one is currently in use. However, on first deployment, I need to do a fallback (DNS entry will not exist). I could do this with tags, but DNS would be more accurate.

miqm avatar Sep 06 '21 10:09 miqm

Just hitting this now, with key vaults. I think I'm going to test running a pre-run deployment script and probe the keyvault API to see if a vault is in soft delete state or not. If so, then bump the name from kv-test-001 to kv-test-002.

pabelanger avatar Sep 24 '21 14:09 pabelanger

+1. We need this too. Deploying Synapse workspace with encryption requires knowing if one already exists to control Key Vault access policy

minhdn90 avatar Sep 28 '21 12:09 minhdn90

Would love this - our use case is deploying a VM and putting the admin password into a keyvault - if the VM already exists we don't want to overwrite the value in the vault as it isn't actually applied to the VM.

jakelevinez avatar Oct 12 '21 16:10 jakelevinez

In theory, due to the idempotent nature of ARM Template/Bicep deployments, you shouldn't need to know if it is new or existing. The AKS issue is definitely a service-side bug. I will send this to them.

@bengavin -- this should be true for the keyvault secret as well. As long as the secret value is the same, the update would be a NoOp. What issue is the update causing?

If one attempts to recreate a role assignment with the same properties, it yields an error "Tenant ID, application ID, principal ID, and scope are not allowed to be updated. (Code:RoleAssignmentUpdateNotPermitted)." Given they use an idempotent GUID based on these unchanged values, it would be ideal to have some mechanism by which to check whether the role assignment already exists before attempting to set it again, knowing that attempting to set an existing such role will fail the deployment.

WhitWaldo avatar Oct 19 '21 01:10 WhitWaldo

@WhitWaldo I have an open ticket with MS about that particular case with roleassignments - even with the same GUID the deployment will fail if you don't wait long enough between deplyments (on the order of days). If you wait a few days between deployments, it does work.

@alex-frankel I can speak to the secret issue - you can't have it deploy the same secret value, because if you can generate that value again, you've in some way hardcoded your value into your template. Proper secret generation would use something like the newGuid() function which won't allow you to create the same secret value, for obvious security reasons.

jakelevinez avatar Oct 19 '21 12:10 jakelevinez

@jakelevinez

I just noticed your comment, you can likely avoid the error by adding the property 'principalType'

It is documented here: https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-template#new-service-principal

image

brwilkinson avatar Oct 27 '21 00:10 brwilkinson

@brwilkinson Unfortunately it looks like in my case we are already doing ServicePrincipal and on a relatively recent preview API version, must be something different affecting us.

jlevine-aba avatar Oct 27 '21 02:10 jlevine-aba

@jlevine-aba okay, well worth a try, hopefully support can sort you out then.

only other tip to slow deployments down to buy you some time is to put batchsize(1) on your module loops. That may give the assignments enough time to replicate since they go in series... hard to say without all of the details, however support should be able to provide guidance.

brwilkinson avatar Oct 27 '21 02:10 brwilkinson

Joining this thread this time also because of KeyVault. My issue is that the KeyVault resource requires the access policies array to be provided, which means that if I deploy the resource, all my access policies are overridden. Only deploying KeyVault if it doesn't exist feels like the lesser evil although far from ideal.

I understand that in theory, ARMs should be idempotent, but in practice, many resources are not. I think this team should be pragmatic and add support to this feature to make adoption easier. I'm confident we'll see fewer and fewer issues in the future as the industry matures more, but we are quite not there yet.

mdarefull avatar Nov 08 '21 22:11 mdarefull

@alex-frankel just another case that we NEED to check if a resource exists. For using aks and application gateway ingress controller. The thing is that if we do first time deployment we might need to put some default routes at the beginning. Then, when we deploy pods to AKS and AGIC is enabled, it will override AGW configuration to whatever is set in pods. Since AGW routes, listeners, etc are not resources but properties, next deployment of our template will clear the configuration made by AGIC making our services in AKS loose it’s ingress connectivity. So we need to either deploy AGW manually and keep it outside ARM or have a condition to check if we have the AGW already created. Also, if we can determine that resource exists, we can call a module on that condition to get the current configuration and do a merge with some additional routes we want to add to not disturb things added by AGIC.

miqm avatar Nov 09 '21 21:11 miqm

@miqm I think this one would be better if the backend Rules were actually a standalone property that would allow you to not overwrite these settings, which would be similar to the NATRules on a Load balancer or the App Configuration settings on a Web Site. linking to the open issue on this as well #2316

brwilkinson avatar Nov 09 '21 21:11 brwilkinson

There's a way to accomplish this by using deployment scripts. I've created a Bicep module to check whether a resource exists, which can be found here: https://github.com/olafloogman/BicepModules/blob/main/resource-exists.bicep

There's the additional overhead of needing to run the script in a container instance, but according to MS docs: Deployment script execution is an idempotent operation. If none of the deploymentScripts resource properties (including the inline script) are changed, the script doesn't execute when you redeploy the template.

olafloogman avatar Nov 18 '21 09:11 olafloogman

In theory, due to the idempotent nature of ARM Template/Bicep deployments, you shouldn't need to know if it is new or existing.

Unfortunately that theory doesn't hold up. There are RPs that are not idempotent (VNet and AKS, for example). Workarounds such as scripts have to be utilized, making the deployment more fragile and significantly slower. Having this feature would allow users to have a safe and simple workaround for non-idempotent RPs.

How about extending the list* function for resource groups?

resourceGroup().listResources('microsoft.network/virtualNetworks', 'myvnet') // resource name as an optional filter

aelij avatar Dec 16 '21 11:12 aelij

@alex-frankel, @bmoore-msft: I think there's a need for a method to check if an AKS cluster exists.

When you deploy for the first time following template:

resource aksCluster 'Microsoft.ContainerService/managedClusters@2021-10-01' = {
  name: 'aks-bicep'
  location: resourceGroup().location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    kubernetesVersion: '1.22.4'
    dnsPrefix: 'dnsprefix'
    enableRBAC: true
    nodeResourceGroup: 'aks-bicep-cluster-rg'
    agentPoolProfiles: [
      {
        name: 'system'
        count: 1
        vmSize: 'Standard_B2s'
        osType: 'Linux'
        mode: 'System'
        enableNodePublicIP: false
      }     
    ]
  }
}

It's all good. Also, to manage agent pools, you do have a sub-type agentpools like this:

resource system 'agentPools' = {
    name: 'system'
    properties: {
      count: 1
      osType: 'Linux'
      mode: 'System'
      vmSize: 'Standard_B2s'
    }
  }

Together with deployment stacks, you can manage existing agent pools. But key points here are:

  1. keep in sync a single entry array object of agentPoolProfiles property during greenfield deployment with a child resource that describes this same node pool.
  2. During further deployments where the cluster exists set the agentPoolProfiles property to null.

Reason behind this is that AKS needs to have at least one node pool (which is understandable). But since child resources are deployed after the parent resource, there's no way for RP to get definitions of node pools deployed later. How the RPs designed the API seems to be quite optimal - you don't need to put all agent pools in a single property and manage them through it (see vnet or key vault access policies property case) but you have to somehow provide initial aks configuration.

If we had a resourceExists function, we could use it in following way:

var defaultPool = {
  name: 'system2'
  properties: {
    count: 1
    osType: 'Linux'
    mode: 'System'
    vmSize: 'Standard_B2ms'
  }
}
resource aksCluster 'Microsoft.ContainerService/managedClusters@2021-10-01' = {
  name: 'aks-bicep'
  location: resourceGroup().location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    kubernetesVersion: '1.22.4'
    dnsPrefix: 'dnsprefix'
    enableRBAC: true
    nodeResourceGroup: 'aks-bicep-cluster-rg'
    agentPoolProfiles: resourceExists(this) ? null : [
      union({
        name: defaultPool.name
      }, defaultPool.properties)
    ]
  }
  resource system 'agentPools' = {
    name: defaultPool.name
    properties: defaultPool.properties
  }
  resource user 'agentPools' = {
    name: 'user'
    properties: {
      count: 1
      osType: 'Linux'
      mode: 'User'
      vmSize: 'Standard_B2s'
    }
  }
}

Alternatively, AKS Team would need to always ignore agentPoolProfiles in deployments to an existing cluster (since it's not possible to update that list anyway), or the AzureRM would have a way to send multiple PUT requests for parent and all/specified child-resources in one shot so RP API can get the whole picture of the entire resource deployment.

miqm avatar Jan 23 '22 13:01 miqm

This is similar to the vnet example - with a twist that a cluster must always have a pool -- I'm not sure the API is optimally designed. The availability of a child resource suggests that the pools have a different lifecycle than the parent/cluster. If that's true, the creation of a parent should not be dependent on the definition of a child. IOW, I should be able to create a cluster without a pool. That means of course I have limited functionality on that resource until the pool is created but that's not uncommon.

bmoore-msft avatar Jan 24 '22 16:01 bmoore-msft

In AKS after cluster is created, pools indeed does have a different lifecycle, but AKS to run requires at least one agentpool. In terraform, it's designed in a way that AKS resource has so called "default" pool and additional pools you add as a different resources. But the key in AKS is that you ca update only some properties of the pool. VM size can be set only when creating pool - it can't be updated. If you wish to change the "default" pool - you need to create a new pool and remove the old one. This is now doable in ARM/bicep with deployment stacks but we set agentPoolProfiles to null. Terraform approach with a defaultNodePool property is also not as good as it looks like - changing vmsize of the default node pool removes entire cluster (sic!) - see https://github.com/hashicorp/terraform-provider-azurerm/issues/7093

Perhaps you are right, that we could have an AKS not fully operational just like vnet - it's not very useful without any subnet.

On the other hand, if we need to wait same time as in vnet to solve this problem - we should have this function and push RP teams to redesign the API. If that's possible. All in all I suspect lower cost is to introduce this function than rebuild the API behind AKS, VNET and other resources.

miqm avatar Jan 24 '22 19:01 miqm

There's an existing modifier, could we not have an upsert modifier?

This could be used to define a symbolic name for any existing resource. Then borrow from TS/C# the 'null coalesce' operators ? and ?? to check for and use properties on the existing resource.

It would also be useful to have this as a modifier for array properties to preserve additional values in arrays such as Tags

This would allow preservation of existing values in both singular and array properties and still follow a declarative syntax.

paulhasell avatar Jan 28 '22 17:01 paulhasell

Also "Microsoft.DataProtection/backupVaults/backupPolicies" fails to deploy twice with the error:

New-AzResourceGroupDeployment: 4:12:39 PM - Error: Code=BMSUserErrorInvalidInput; Message=Policy with name YOURPOLICYNAMEHERE already exist

This is not "user error" or "invalid input". Through at least half a dozen API versions, the Azure Backup team has never tested ARM templates!

Microsoft customers are being told: You never need to conditionally deploy!

Meanwhile: Microsoft is constantly adding new (2021 era!) resources that can't be redeployed via ARM or Bicep!

peter-bertok avatar Feb 03 '22 05:02 peter-bertok

Microsoft.Web/certificates@2019-08-01 is another resource that fails to deploy twice (failing with duplicate certificate) when generating Azure Managed Certificates.

We need a mechanism to deal with resource providers that do not provide idempotent / target state ARM support until they can get fixed. Worse case can we provide an idempotency key as part of the deployment that gets stuck in a tag, and acts like an automatic 'existing' if the tag exists and hasn't changed?

mgranell avatar Feb 24 '22 04:02 mgranell

@mgranell there have been a few discussions around Web Certificates, however i don't believe that has come up so far, If you could open a new discussion with your sample code or find a previous discussion on web certs and we can take a look.

brwilkinson avatar Feb 24 '22 04:02 brwilkinson

I have this same problem with Power BI Workspaces... first time you run, it creates. Second time, you get "Conflict" and no other information about the error. This is really disconcerting

tommck avatar Feb 25 '22 12:02 tommck

@tommck - that one sounds like you need to open a support case...

bmoore-msft avatar Feb 25 '22 17:02 bmoore-msft

@tommck - that one sounds like you need to open a support case...

Oh joy

tommck avatar Feb 25 '22 17:02 tommck

Microsoft.Consumption/budgets is another resource that cannot easily be deployed multiple times, especially without storing state externally somewhere, because

  • it requires a startdate that is not too far in the past
  • updating the budget is not allowed, so you cannot redeploy it with a startdate set to the first day of the current month

hansmbakker avatar Mar 21 '22 14:03 hansmbakker

An very annoying scenario is when you want to assign static ip addresses into a subnet, you need to manually do it or reserve with automatic DHCP and than manually switch to static IP assignment with the IP received from the DHCP reservation.

Such a scenario can also be worked around with check if resource exists and switch from dhcp to static ip assignment using the IP reserved from DHCP.

It would be nice however to have a fix at the deployment phase not to have the IP address mandatory when we are doing static IP allocation on a nic.

octaviancretu avatar Mar 25 '22 08:03 octaviancretu

I use bicep to create a feature flag (part of Azure App Configuration service). I create the flag using bicep and set the value initially to 'false'. Once created, it's up to the business/ops to decide when the feature is enabled (i.e. feature flag changes to true). So ideally I want to be able to validate within my bicep if the feature flag already exists so I only create it once and don't overwrite values that might have been set by the business/ops.

Ruud2000 avatar Apr 25 '22 12:04 Ruud2000