terraform-provider-azuread icon indicating copy to clipboard operation
terraform-provider-azuread copied to clipboard

Azuread_app_role_assignment: Provider produced inconsistent result after apply

Open jamesbwilkinson opened this issue 2 years ago • 6 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

Terraform 1.1.2 azuread 2.17.0

Affected Resource(s)

azuread_application azuread_service_principal azuread_app_role_assignment

Terraform Configuration Files

locals {

  app_roles_example = {
    APP_ROLE1 = {
      description = "APP_ROLE1",
      value       = "app.role.1",
      permissions = [var.group_id_1, var.group_id_2, var.group_id_1, var.group_id_6]
    },
    APP_ROLE2 = {
      description = "APP_ROLE2",
      value       = "app.role.2",
      permissions = [var.group_id_1, var.group_id_3, var.group_id_4, var.group_id_5]
    },
    APP_ROLE3 = {
      description = "APP_ROLE3",
      value       = "app.role.3",
      permissions = [var.group_id_1, var.group_id_2, var.group_id_4, var.group_id_7, var.group_id_1, var.group_id_9]
    }
  }
  list_of_assignments = flatten([
    for displayrolename, approle in var.app_roles : [
      for appvalue, groups in approle.permissions : {
        role_name = approle.value
        group     = groups
      }
    ]
  ])
}


resource "random_uuid" "uuid_example" {
  for_each = local.app_roles_example
}

resource "azuread_application" "example_app_reg" {
  display_name            = upper("${var.app_name}-${var.deployment_stamp_name}-example-${var.environment}")
  owners                  = [data.azuread_client_config.current.object_id]
  group_membership_claims = ["All"]

  dynamic "app_role" {
    for_each = local.app_roles_example
    content {
      allowed_member_types = ["User", "Application"]
      description          = app_role.value.description
      display_name         = app_role.key
      enabled              = true
      id                   = random_uuid.uuid_example[app_role.key].result
      value                = app_role.value.value
    }
  }
}

resource "azuread_service_principal" "example_sp" {
  application_id               = azuread_application.example_app_reg.application_id
  app_role_assignment_required = false
  owners                       = [data.azuread_client_config.current.object_id]
}

resource "azuread_app_role_assignment" "role_assignments" {
  for_each = { for assignment in local.list_of_assignments : "${assignment.role_name}.${assignment.group}" => assignment }

  app_role_id         = azuread_application.example_app_reg.app_role_ids[each.value.role_name]
  principal_object_id = each.value.group
  resource_object_id  = azuread_service_principal.example_sp.object_id
}

Debug Output

[INFO] provider.terraform-provider-azuread_v2.17.0_x5: [DEBUG] Resource Service Principal "[service principal id removed]" was not found - removing from state!:

[ERROR] vertex "module.app_role_assignment.azuread_app_role_assignment.role_assignments["example"]" error: Provider produced inconsistent result after apply

[INFO] Starting apply for module.app_role_assignment.azuread_app_role_assignment.role_assignments["example"]

Panic Output

Error: Provider produced inconsistent result after apply

When applying changes to app_role_assignment.azuread_app_role_assignment.role_assignments["example"], provider "provider["registry.terraform.io/hashicorp/azuread"]" produced an unexpected new value: Root resource was present, but now absent.

This is a bug in the provider, which should be reported in the provider's own issue tracker.

Expected Behavior

Application registration and service principal is created along with all role assignments.

Actual Behavior

Application roles are assigned (to security group) but terraform exits with error "Provider produced inconsistent result after apply". On subsequent run we see an error that the assignment is already in place:

Could not create app role assignment AppRoleAssignedToClient.BaseClient.Post(): unexpected status 400 with OData error: Request_BadRequest: Permission being assigned already exists on the object

jamesbwilkinson avatar Mar 29 '22 15:03 jamesbwilkinson

We also have a similar behavior when we are adding Groups on an Enterprise Application.

Where it sems the Terraform Provider adds the group to the Enterprise Application, but doesn't update the TF state.

│ AppRoleAssignedToClient.BaseClient.Post(): unexpected status 400 with OData
│ error: Request_BadRequest: Permission being assigned already exists on the
│ object
╵

When we are creation a few thousands entries in the Users and Groups, we get around 100 entries not registered to the state.

AlexandreODelisle avatar Mar 30 '22 23:03 AlexandreODelisle

Hi, these are unfortunately due to inconsistency issues with the API. We'll try to work around them at the expense of longer apply times, but we can't always catch all of these and the chances of hitting this increase with higher numbers of assignments per service principal.

manicminer avatar Mar 31 '22 08:03 manicminer

@manicminer We have noticed that we hit this issue very often when we update an app_role assignment on the azuread_application resource. It looks like in this case all roles are removed first, and then re-added, even though only once role has been changed. This operation as it seems is not very well handled by the underlying API. Is there any way to refactor the Terraform code to avoid that? Obviously we would like to keep the dynamic block and the locals definition (we also need to iterate over it for the role assignments), especially as we are maintaining a lot of roles for some applications. Do you potentially have any best practice there for us?

alxy avatar Apr 05 '22 09:04 alxy

@alxy That sounds like a separate issue as the original report is about the azuread_app_role_assignment resource rather than the app_role block on the azuread_application resource.

However, due to the way app roles (and permission scopes) have to be disabled before they can be updated or removed, we do our best to only affected the roles/scopes which have actually been changed. It's possible this may fall part a bit with extensive use of dynamic blocks, although I don't believe I've encountered this [yet]. If you believe the provider is disabling or removing roles which you haven't changed, then I'd recommend opening a new issue, ensuring to include a reproducible config and a debug log so that we can try to track this down :)

manicminer avatar Apr 06 '22 00:04 manicminer

We have been playing around with some changes and found we have been able to improve the assignment success by:

  • Splitting the app_role creation and azuread_app_role_assignment into separate terraform jobs.

  • Reducing the parallelism of azuread_app_role_assignment to 1.

  • adding a wait helper to the function appRoleAssignmentResourceDelete in app_role_assignment_resource.go

func appRoleAssignmentResourceDelete(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
	client := meta.(*clients.Client).AppRoleAssignments.AppRoleAssignedToClient

	id, err := parse.AppRoleAssignmentID(d.Id())
	if err != nil {
		return tf.ErrorDiagPathF(err, "id", "Parsing app role assignment with ID %q", d.Id())
	}

	if status, err := client.Remove(ctx, id.ResourceId, id.AssignmentId); err != nil {
		return tf.ErrorDiagPathF(err, "id", "Deleting app role assignment for resource %q with ID %q, got status %d", id.ResourceId, id.AssignmentId, status)
	}

	// Wait for app role to be deleted
	if err := helpers.WaitForDeletion(ctx, func(ctx context.Context) (*bool, error) {
		client.BaseClient.DisableRetries = true
		if status, err := client.Remove(ctx, id.ResourceId, id.AssignmentId); err != nil {
			if status == http.StatusNotFound {
				return utils.Bool(false), nil
			}
			return nil, err
		}
		return utils.Bool(true), nil
	}); err != nil {
		return tf.ErrorDiagF(err, "Waiting for deletion of app role assignment with object ID %q", id.ResourceId)
	}

	return nil
}

Hopefully this might help with the investigation/resolution?

jamesbwilkinson avatar Apr 13 '22 11:04 jamesbwilkinson

The changes suggested above have helped with deletion, but not creation. Still seeing multiple assignment errors in creation of new app role assignments :(

Have also added a short delay after the creation of the assignment which seems to improve this, although adds a significant delay when there are many role assignments. My Go skills don't extend to a fully fledged helper function :D

func appRoleAssignmentResourceCreate(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
	client := meta.(*clients.Client).AppRoleAssignments.AppRoleAssignedToClient
	servicePrincipalsClient := meta.(*clients.Client).AppRoleAssignments.ServicePrincipalsClient

	appRoleId := d.Get("app_role_id").(string)
	principalId := d.Get("principal_object_id").(string)
	resourceId := d.Get("resource_object_id").(string)
	if _, status, err := servicePrincipalsClient.Get(ctx, resourceId, odata.Query{}); err != nil {
		if status == http.StatusNotFound {
			return tf.ErrorDiagPathF(err, "principal_object_id", "Service principal not found for resource (Object ID: %q)", resourceId)
		}
		return tf.ErrorDiagF(err, "Could not retrieve service principal for resource (Object ID: %q)", resourceId)
	}
	properties := msgraph.AppRoleAssignment{
		AppRoleId:   utils.String(appRoleId),
		PrincipalId: utils.String(principalId),
		ResourceId:  utils.String(resourceId),
	}

	appRoleAssignment, _, err := client.Assign(ctx, properties)
	if err != nil {
		return tf.ErrorDiagF(err, "Could not create app role assignment")
	}

	if appRoleAssignment.Id == nil || *appRoleAssignment.Id == "" {
		return tf.ErrorDiagF(errors.New("ID returned for app role assignment is nil"), "Bad API response")
	}

	if appRoleAssignment.ResourceId == nil || *appRoleAssignment.ResourceId == "" {
		return tf.ErrorDiagF(errors.New("Resource ID returned for app role assignment is nil"), "Bad API response")
	}

	id := parse.NewAppRoleAssignmentID(*appRoleAssignment.ResourceId, *appRoleAssignment.Id)
	d.SetId(id.String())

	// Added pause before read operation to improve consistency
	time.Sleep(10 * time.Second)

	return appRoleAssignmentResourceRead(ctx, d, meta)
}

jamesbwilkinson avatar Apr 20 '22 14:04 jamesbwilkinson

Same issue here, any updates?

pfo-bnt avatar Nov 08 '22 15:11 pfo-bnt

This is causing headaches for our implementation of this provider as well. Deployments has to be run twice if an app has many approle assignments. Would a time delay that scales (up to a threshold) with the number of approle assignments be feasible?

nbaju1 avatar Dec 21 '22 12:12 nbaju1