terraform-provider-azuread
terraform-provider-azuread copied to clipboard
Error when destroying azuread_application_password
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform (and AzureAD Provider) Version
terraform 1.0.7, azuread provider 2.9.0
Affected Resource(s)
azuread_application_password
Actually the output isn't quite explicit that this is the affected resource; but I suppose it must be.
Terraform Configuration Files
Can't share our full configuration, but the basic set up is that we have a azuread_application, a azuread_service_principal and a azuread_application_password.
Can try to shrink to a shareable reproduction if needed.
Debug Output
Coming soon, I hope. Though experience does suggest that bugs will often refuse to reproduce when you turn on the debug logs...
Panic Output
Expected Behavior
Resources are successfully destroyed at terraform destroy
Actual Behavior
│ Error: Removing password credential "5e8d9103-f96c-4e85-ad14-8b2d6f3382f7" from application with object ID "126c83a4-aa27-46c3-a382-e2408c2efac5"
│
│ ApplicationsClient.BaseClient.Post(): unexpected status 404 with OData
│ error: Directory_ObjectNotFound: Unable to read the company information
│ from the directory.
Steps to Reproduce
terraform applyterraform destroy
Important Factoids
References
Hi @dimbleby, thanks for reporting this issue. This looks like an interesting error case, if you are able to reproduce and could post debug output that would be exceedingly helpful. Thanks!
I have a repro, just need to redact credentials from the log... will update shortly.
https://gist.github.com/dimbleby/88190ed4b89e0d50542c1e2f86aa109b#file-debug-log
(I've removed the client secret and all access tokens that it obtains. All applications etc mentioned in this log are now destroyed, so it doesn't matter that their details remain visible).
The 404 appears at line 13689.
I have started seeing this when taking the upgrade from azuread 2.7.0 to 2.9.0. I don't know whether this is causal, or only coincidence.
Just updating to note that after a few retries both up- and down-level: I am pretty convinced that we hit this at 2.9.0 but not at 2.7.0
Noting, in case it helps, that it seems that we do not hit this at 2.8.0 (but do continue to hit it at 2.9.0)
Thanks @dimbleby, all of this is extremely helpful! We did add some changes in 2.9 that might be related. I'll dig into this sometime tomorrow 👍
Hi @dimbleby, from what I can tell, the provider is doing the right things during this sequence. I simulated the API error with an intercepting proxy and observed the same on my end. The error itself seems quite likely to be an API bug - at 14:13:08 the application is retrieved and the password credential is included in the response, yet shortly after a series of requests to delete that password fail sequentially with the error you pasted (the last request being at 14:13:23). The provider does seem to be retrying here, given the time delay and that when reproducing locally I can see the retries.
At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks. Sorry I don't have a better resolution for you here, I did try adding some mitigation to the code but since the preceding GET requests are successful, they don't help at all.
Thanks. Please let me know if there's anything I can do to chip in on encouraging upstream to address the probable API bug.
Curious that we only start hitting this with 2.9.0, did anything change about the way that the provider accesses the API? Maybe just the timings?
(Quite the shame too - I was hoping to pick up the mitigation for #611.)
One oddity that I spotted is that after a DELETE we do a GET to verify that the thing is really deleted: and it looks as though the (expected) 404 causes retries eg at line 13514. Perhaps this is exactly happening because of the #611-mitigation, but I wonder whether it was (i) intended (ii) possibly somehow provoking this behaviour?
@dimbleby I'm not certain that any changes in 2.9.0 are contributing to your seeing this more with that version, could it perhaps be coincidence due to the intermittent nature of this API bug? We haven't changed anything with regards to deletion of app or SP passwords, and assuming that your Terraform config defines a relationship between the app and it's secrets, such that the secrets are destroyed first, there should be no material difference in behavior specifically for the password resource.
For the repeated GET requests after deletion (for apps and SPs), we do this intentionally as we're checking for 5 successive 404s to try and guarantee deletion :)
If it is only coincidence, it's becoming a very strong one - we have never hit this prior to 2.9.0 (and continue not to hit it in our regular pipelines) whereas we hit it more often than not at 2.9.0.
(Each of our pipelines sets up and destroys several applications and passwords; just one of these passwords erroring out is enough for the pipeline to fail. So I don't necessarily mean that we hit it on more than half of our resources).
So yes, I'm fairly convinced that something has changed at 2.9.0.
@dimbleby I can think of a potential cause if Terraform has not inferred a direct dependency between your application resource and your application_password resource. In that event, the recent changes in 2.9.0 could potentially be causing this error to surface.
Could you perhaps share a small configuration piece showing these two resources? It would be useful in trying to reproduce this.
We follow a pattern something like this:
locals {
microsoft_graph_app_id = "00000003-0000-0000-c000-000000000000"
microsoft_graph_user_read = "e1fe6dd8-ba31-4d61-89e7-88639da4683d"
}
data "azuread_client_config" "current" {}
resource "azuread_application" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
display_name = var.aad_foo_name
owners = [data.azuread_client_config.current.object_id]
web {
redirect_uris = [whatever]
}
required_resource_access {
resource_app_id = local.microsoft_graph_app_id
resource_access {
id = local.microsoft_graph_user_read
type = "Scope"
}
}
lifecycle {
ignore_changes = [
owners
]
}
}
resource "azuread_service_principal" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
application_id = azuread_application.foo[0].application_id
app_role_assignment_required = true
owners = [data.azuread_client_config.current.object_id]
lifecycle {
ignore_changes = [
owners
]
}
}
resource "azuread_application_password" "foo" {
count = var.aad_foo_name == "" ? 0 : 1
application_object_id = azuread_application.foo.0.id
display_name = "Foo secret"
end_date = var.aad_password_end_date
}
Thanks, I'll try and use that for further testing to track this down
I just started getting this as well after updating from 2.3.0 -> 2.13.0 Our config is pretty similar
resource "azuread_application" "azp" {
display_name = var.pool_name
sign_in_audience = "AzureADMyOrg"
owners = [
data.azurerm_client_config.current.object_id,
]
web {
implicit_grant {
access_token_issuance_enabled = true
id_token_issuance_enabled = true
}
}
}
resource "azuread_service_principal" "azp" {
application_id = azuread_application.azp.application_id
app_role_assignment_required = false
owners = [
data.azurerm_client_config.current.object_id,
]
}
resource "time_rotating" "azp" {
rotation_days = 365
}
resource "azuread_application_password" "azp" {
application_object_id = azuread_application.azp.object_id
end_date = time_rotating.azp.rotation_rfc3339
}
At this time I'm inclined to mark this as an API bug and will try to raise it upstream with the relevant folks.
@manicminer did you manage to do this? Is there an upstream bug somewhere that I can go vote on? thanks!
Just noting that this still happens at 2.33.0 - and we continue to use azuread 2.8.0 to avoid it.