terraform-provider-azuread
terraform-provider-azuread copied to clipboard
Azuread_app_role_assignment: Provider produced inconsistent result after apply
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform (and AzureAD Provider) Version
Terraform 1.1.2 azuread 2.17.0
Affected Resource(s)
azuread_application azuread_service_principal azuread_app_role_assignment
Terraform Configuration Files
locals {
app_roles_example = {
APP_ROLE1 = {
description = "APP_ROLE1",
value = "app.role.1",
permissions = [var.group_id_1, var.group_id_2, var.group_id_1, var.group_id_6]
},
APP_ROLE2 = {
description = "APP_ROLE2",
value = "app.role.2",
permissions = [var.group_id_1, var.group_id_3, var.group_id_4, var.group_id_5]
},
APP_ROLE3 = {
description = "APP_ROLE3",
value = "app.role.3",
permissions = [var.group_id_1, var.group_id_2, var.group_id_4, var.group_id_7, var.group_id_1, var.group_id_9]
}
}
list_of_assignments = flatten([
for displayrolename, approle in var.app_roles : [
for appvalue, groups in approle.permissions : {
role_name = approle.value
group = groups
}
]
])
}
resource "random_uuid" "uuid_example" {
for_each = local.app_roles_example
}
resource "azuread_application" "example_app_reg" {
display_name = upper("${var.app_name}-${var.deployment_stamp_name}-example-${var.environment}")
owners = [data.azuread_client_config.current.object_id]
group_membership_claims = ["All"]
dynamic "app_role" {
for_each = local.app_roles_example
content {
allowed_member_types = ["User", "Application"]
description = app_role.value.description
display_name = app_role.key
enabled = true
id = random_uuid.uuid_example[app_role.key].result
value = app_role.value.value
}
}
}
resource "azuread_service_principal" "example_sp" {
application_id = azuread_application.example_app_reg.application_id
app_role_assignment_required = false
owners = [data.azuread_client_config.current.object_id]
}
resource "azuread_app_role_assignment" "role_assignments" {
for_each = { for assignment in local.list_of_assignments : "${assignment.role_name}.${assignment.group}" => assignment }
app_role_id = azuread_application.example_app_reg.app_role_ids[each.value.role_name]
principal_object_id = each.value.group
resource_object_id = azuread_service_principal.example_sp.object_id
}
Debug Output
[INFO] provider.terraform-provider-azuread_v2.17.0_x5: [DEBUG] Resource Service Principal "[service principal id removed]" was not found - removing from state!:
[ERROR] vertex "module.app_role_assignment.azuread_app_role_assignment.role_assignments["example"]" error: Provider produced inconsistent result after apply
[INFO] Starting apply for module.app_role_assignment.azuread_app_role_assignment.role_assignments["example"]
Panic Output
Error: Provider produced inconsistent result after apply
When applying changes to app_role_assignment.azuread_app_role_assignment.role_assignments["example"], provider "provider["registry.terraform.io/hashicorp/azuread"]" produced an unexpected new value: Root resource was present, but now absent.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Expected Behavior
Application registration and service principal is created along with all role assignments.
Actual Behavior
Application roles are assigned (to security group) but terraform exits with error "Provider produced inconsistent result after apply". On subsequent run we see an error that the assignment is already in place:
Could not create app role assignment AppRoleAssignedToClient.BaseClient.Post(): unexpected status 400 with OData error: Request_BadRequest: Permission being assigned already exists on the object
We also have a similar behavior when we are adding Groups on an Enterprise Application.
Where it sems the Terraform Provider adds the group to the Enterprise Application, but doesn't update the TF state.
│ AppRoleAssignedToClient.BaseClient.Post(): unexpected status 400 with OData
│ error: Request_BadRequest: Permission being assigned already exists on the
│ object
╵
When we are creation a few thousands entries in the Users and Groups, we get around 100 entries not registered to the state.
Hi, these are unfortunately due to inconsistency issues with the API. We'll try to work around them at the expense of longer apply times, but we can't always catch all of these and the chances of hitting this increase with higher numbers of assignments per service principal.
@manicminer We have noticed that we hit this issue very often when we update an app_role
assignment on the azuread_application
resource. It looks like in this case all roles are removed first, and then re-added, even though only once role has been changed. This operation as it seems is not very well handled by the underlying API.
Is there any way to refactor the Terraform code to avoid that? Obviously we would like to keep the dynamic
block and the locals definition (we also need to iterate over it for the role assignments), especially as we are maintaining a lot of roles for some applications. Do you potentially have any best practice there for us?
@alxy That sounds like a separate issue as the original report is about the azuread_app_role_assignment
resource rather than the app_role
block on the azuread_application
resource.
However, due to the way app roles (and permission scopes) have to be disabled before they can be updated or removed, we do our best to only affected the roles/scopes which have actually been changed. It's possible this may fall part a bit with extensive use of dynamic blocks, although I don't believe I've encountered this [yet]. If you believe the provider is disabling or removing roles which you haven't changed, then I'd recommend opening a new issue, ensuring to include a reproducible config and a debug log so that we can try to track this down :)
We have been playing around with some changes and found we have been able to improve the assignment success by:
-
Splitting the app_role creation and azuread_app_role_assignment into separate terraform jobs.
-
Reducing the parallelism of azuread_app_role_assignment to 1.
-
adding a wait helper to the function appRoleAssignmentResourceDelete in app_role_assignment_resource.go
func appRoleAssignmentResourceDelete(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
client := meta.(*clients.Client).AppRoleAssignments.AppRoleAssignedToClient
id, err := parse.AppRoleAssignmentID(d.Id())
if err != nil {
return tf.ErrorDiagPathF(err, "id", "Parsing app role assignment with ID %q", d.Id())
}
if status, err := client.Remove(ctx, id.ResourceId, id.AssignmentId); err != nil {
return tf.ErrorDiagPathF(err, "id", "Deleting app role assignment for resource %q with ID %q, got status %d", id.ResourceId, id.AssignmentId, status)
}
// Wait for app role to be deleted
if err := helpers.WaitForDeletion(ctx, func(ctx context.Context) (*bool, error) {
client.BaseClient.DisableRetries = true
if status, err := client.Remove(ctx, id.ResourceId, id.AssignmentId); err != nil {
if status == http.StatusNotFound {
return utils.Bool(false), nil
}
return nil, err
}
return utils.Bool(true), nil
}); err != nil {
return tf.ErrorDiagF(err, "Waiting for deletion of app role assignment with object ID %q", id.ResourceId)
}
return nil
}
Hopefully this might help with the investigation/resolution?
The changes suggested above have helped with deletion, but not creation. Still seeing multiple assignment errors in creation of new app role assignments :(
Have also added a short delay after the creation of the assignment which seems to improve this, although adds a significant delay when there are many role assignments. My Go skills don't extend to a fully fledged helper function :D
func appRoleAssignmentResourceCreate(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
client := meta.(*clients.Client).AppRoleAssignments.AppRoleAssignedToClient
servicePrincipalsClient := meta.(*clients.Client).AppRoleAssignments.ServicePrincipalsClient
appRoleId := d.Get("app_role_id").(string)
principalId := d.Get("principal_object_id").(string)
resourceId := d.Get("resource_object_id").(string)
if _, status, err := servicePrincipalsClient.Get(ctx, resourceId, odata.Query{}); err != nil {
if status == http.StatusNotFound {
return tf.ErrorDiagPathF(err, "principal_object_id", "Service principal not found for resource (Object ID: %q)", resourceId)
}
return tf.ErrorDiagF(err, "Could not retrieve service principal for resource (Object ID: %q)", resourceId)
}
properties := msgraph.AppRoleAssignment{
AppRoleId: utils.String(appRoleId),
PrincipalId: utils.String(principalId),
ResourceId: utils.String(resourceId),
}
appRoleAssignment, _, err := client.Assign(ctx, properties)
if err != nil {
return tf.ErrorDiagF(err, "Could not create app role assignment")
}
if appRoleAssignment.Id == nil || *appRoleAssignment.Id == "" {
return tf.ErrorDiagF(errors.New("ID returned for app role assignment is nil"), "Bad API response")
}
if appRoleAssignment.ResourceId == nil || *appRoleAssignment.ResourceId == "" {
return tf.ErrorDiagF(errors.New("Resource ID returned for app role assignment is nil"), "Bad API response")
}
id := parse.NewAppRoleAssignmentID(*appRoleAssignment.ResourceId, *appRoleAssignment.Id)
d.SetId(id.String())
// Added pause before read operation to improve consistency
time.Sleep(10 * time.Second)
return appRoleAssignmentResourceRead(ctx, d, meta)
}
Same issue here, any updates?
This is causing headaches for our implementation of this provider as well. Deployments has to be run twice if an app has many approle assignments. Would a time delay that scales (up to a threshold) with the number of approle assignments be feasible?