terraform-provider-azuread Error when destroying azuread_application

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

terraform 1.0.7, azuread provider 2.9.0

Affected Resource(s)

azuread_application_password

Actually the output isn't quite explicit that this is the affected resource; but I suppose it must be.

Terraform Configuration Files

Can't share our full configuration, but the basic set up is that we have a azuread_application, a azuread_service_principal and a azuread_application_password.

Can try to shrink to a shareable reproduction if needed.

Debug Output

Coming soon, I hope. Though experience does suggest that bugs will often refuse to reproduce when you turn on the debug logs...

Panic Output

Expected Behavior

Resources are successfully destroyed at terraform destroy

Actual Behavior

│ Error: Removing password credential "5e8d9103-f96c-4e85-ad14-8b2d6f3382f7" from application with object ID "126c83a4-aa27-46c3-a382-e2408c2efac5"
│ 
│ ApplicationsClient.BaseClient.Post(): unexpected status 404 with OData
│ error: Directory_ObjectNotFound: Unable to read the company information
│ from the directory.

Steps to Reproduce

terraform apply
terraform destroy

Important Factoids

References

Nov 13 '21 13:11 dimbleby

Hi @dimbleby, thanks for reporting this issue. This looks like an interesting error case, if you are able to reproduce and could post debug output that would be exceedingly helpful. Thanks!

Nov 13 '21 14:11 manicminer

I have a repro, just need to redact credentials from the log... will update shortly.

Nov 13 '21 14:11 dimbleby

https://gist.github.com/dimbleby/88190ed4b89e0d50542c1e2f86aa109b#file-debug-log

(I've removed the client secret and all access tokens that it obtains. All applications etc mentioned in this log are now destroyed, so it doesn't matter that their details remain visible).

The 404 appears at line 13689.

I have started seeing this when taking the upgrade from azuread 2.7.0 to 2.9.0. I don't know whether this is causal, or only coincidence.

Nov 13 '21 14:11 dimbleby

Just updating to note that after a few retries both up- and down-level: I am pretty convinced that we hit this at 2.9.0 but not at 2.7.0

Nov 15 '21 13:11 dimbleby

Noting, in case it helps, that it seems that we do not hit this at 2.8.0 (but do continue to hit it at 2.9.0)

Nov 15 '21 22:11 dimbleby

Thanks @dimbleby, all of this is extremely helpful! We did add some changes in 2.9 that might be related. I'll dig into this sometime tomorrow 👍

Nov 16 '21 00:11 manicminer

Hi @dimbleby, from what I can tell, the provider is doing the right things during this sequence. I simulated the API error with an intercepting proxy and observed the same on my end. The error itself seems quite likely to be an API bug - at 14:13:08 the application is retrieved and the password credential is included in the response, yet shortly after a series of requests to delete that password fail sequentially with the error you pasted (the last request being at 14:13:23). The provider does seem to be retrying here, given the time delay and that when reproducing locally I can see the retries.

At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks. Sorry I don't have a better resolution for you here, I did try adding some mitigation to the code but since the preceding GET requests are successful, they don't help at all.

Nov 18 '21 11:11 manicminer

Thanks. Please let me know if there's anything I can do to chip in on encouraging upstream to address the probable API bug.

Curious that we only start hitting this with 2.9.0, did anything change about the way that the provider accesses the API? Maybe just the timings?

(Quite the shame too - I was hoping to pick up the mitigation for #611.)

One oddity that I spotted is that after a DELETE we do a GET to verify that the thing is really deleted: and it looks as though the (expected) 404 causes retries eg at line 13514. Perhaps this is exactly happening because of the #611-mitigation, but I wonder whether it was (i) intended (ii) possibly somehow provoking this behaviour?

Nov 18 '21 11:11 dimbleby

@dimbleby I'm not certain that any changes in 2.9.0 are contributing to your seeing this more with that version, could it perhaps be coincidence due to the intermittent nature of this API bug? We haven't changed anything with regards to deletion of app or SP passwords, and assuming that your Terraform config defines a relationship between the app and it's secrets, such that the secrets are destroyed first, there should be no material difference in behavior specifically for the password resource.

For the repeated GET requests after deletion (for apps and SPs), we do this intentionally as we're checking for 5 successive 404s to try and guarantee deletion :)

Nov 18 '21 14:11 manicminer

If it is only coincidence, it's becoming a very strong one - we have never hit this prior to 2.9.0 (and continue not to hit it in our regular pipelines) whereas we hit it more often than not at 2.9.0.

(Each of our pipelines sets up and destroys several applications and passwords; just one of these passwords erroring out is enough for the pipeline to fail. So I don't necessarily mean that we hit it on more than half of our resources).

So yes, I'm fairly convinced that something has changed at 2.9.0.

Nov 18 '21 14:11 dimbleby

@dimbleby I can think of a potential cause if Terraform has not inferred a direct dependency between your application resource and your application_password resource. In that event, the recent changes in 2.9.0 could potentially be causing this error to surface.

Could you perhaps share a small configuration piece showing these two resources? It would be useful in trying to reproduce this.

Nov 19 '21 01:11 manicminer

We follow a pattern something like this:

locals {
  microsoft_graph_app_id = "00000003-0000-0000-c000-000000000000"
  microsoft_graph_user_read = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
}

data "azuread_client_config" "current" {}

resource "azuread_application" "foo" {
  count        = var.aad_foo_name == "" ? 0 : 1
  display_name = var.aad_foo_name
  owners       = [data.azuread_client_config.current.object_id]
  web {
    redirect_uris = [whatever]
  }

  required_resource_access {
    resource_app_id = local.microsoft_graph_app_id
    resource_access {
      id   = local.microsoft_graph_user_read
      type = "Scope"
    }
  }

  lifecycle {
    ignore_changes = [
      owners
    ]
  }
}

resource "azuread_service_principal" "foo" {
  count                        = var.aad_foo_name == "" ? 0 : 1
  application_id               = azuread_application.foo[0].application_id
  app_role_assignment_required = true
  owners                       = [data.azuread_client_config.current.object_id]

  lifecycle {
    ignore_changes = [
      owners
    ]
  }
}

resource "azuread_application_password" "foo" {
  count                 = var.aad_foo_name == "" ? 0 : 1
  application_object_id = azuread_application.foo.0.id
  display_name          = "Foo secret"
  end_date              = var.aad_password_end_date
}

Nov 19 '21 09:11 dimbleby

Thanks, I'll try and use that for further testing to track this down

Nov 19 '21 14:11 manicminer

I just started getting this as well after updating from 2.3.0 -> 2.13.0 Our config is pretty similar

resource "azuread_application" "azp" {
  display_name     = var.pool_name
  sign_in_audience = "AzureADMyOrg"
  owners = [
    data.azurerm_client_config.current.object_id,
  ]

  web {
    implicit_grant {
      access_token_issuance_enabled = true
      id_token_issuance_enabled     = true
    }
  }
}

resource "azuread_service_principal" "azp" {
  application_id               = azuread_application.azp.application_id
  app_role_assignment_required = false
  owners = [
    data.azurerm_client_config.current.object_id,
  ]
}

resource "time_rotating" "azp" {
  rotation_days = 365
}

resource "azuread_application_password" "azp" {
  application_object_id = azuread_application.azp.object_id
  end_date              = time_rotating.azp.rotation_rfc3339
}

Jan 04 '22 19:01 hbuckle

At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks.

@manicminer did you manage to do this? Is there an upstream bug somewhere that I can go vote on? thanks!

Jan 07 '22 10:01 dimbleby

Just noting that this still happens at 2.33.0 - and we continue to use azuread 2.8.0 to avoid it.

Jan 30 '23 12:01 dimbleby

terraform-provider-azuread
terraform-provider-azuread copied to clipboard

Error when destroying azuread_application_password

Community Note

Terraform (and AzureAD Provider) Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

terraform-provider-azuread terraform-provider-azuread copied to clipboard

Error when destroying azuread_application_password

Community Note

Terraform (and AzureAD Provider) Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

References

terraform-provider-azuread
terraform-provider-azuread copied to clipboard