terraform-provider-azuread icon indicating copy to clipboard operation
terraform-provider-azuread copied to clipboard

Terraform crashing due to group size in azuread_app_role_assignment

Open josefismael opened this issue 3 years ago • 5 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureAD Provider) Version

tf version: 1.1.7 azuread: 2.18

Affected Resource(s)

azuread_app_role_assignment

Terraform Configuration Files

Not at liberty to disclose full config info, however issue is presenting itself when an azuread_group in an azuread_app_role_assignment has a large number of members (our case there is over 80k). The assignment and the group were created successfully, however now that the group has been populated with all members, this is causing our entire pipeline to crash.

I can see that the provider plugin is trying to do a graph call to get the membership of the group. Is there a way we can ignore this call? We are managing membership via a separate idM solution, so we should not need tf to be authoritative for the membership.

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp

Debug Output

1510│ "https://graph.microsoft.com/beta/***/groups/<groupObjectID>/members?$select=id&$skiptoken=<token>": 

Error: Could not retrieve members for group with object ID "<groupObjectID>" 
1504│ 
1507│ 121: owners = [data.azuread_client_config.current.object_id] 
1508│ 
1509│ GroupsClient.BaseClient.Get(): Get 
1510│ "https://graph.microsoft.com/beta/***/groups/<groupObjectID>/members?$select=id&$skiptoken=<tokenvalue>": 
1511│ http: RoundTripper implementation (*retryablehttp.RoundTripper) returned a 
1512│ nil *Response with a nil error 

Panic Output

Expected Behavior

Actual Behavior

Steps to Reproduce

  1. terraform apply

Important Factoids

References

  • #0000

josefismael avatar Mar 10 '22 17:03 josefismael

Hi @josefismael, thanks for reporting this issue. 80K members is quite a lot and I can see how this could easily introduce timeouts or other API related errors (e.g. rate limiting, failed responses) as we attempt to determine the group's membership. Unfortunately it is necessary to do this so that we can assert the members property, and in order to do that we need to know who those members are.

You may have tried to use the lifecycle meta argument to ignore changes to members, however I don't believe this will work as this abstraction happens outside the provider and simply informs Terraform core about your intentions - this setting is not passed down to providers.

I can certainly look at adding greater resilience to the read functions for this resource, to make API errors less likely, ~but I can't think of a way to sidestep membership management altogether I'm afraid~. This might be able to improve your pipeline reliability, though it won't do anything about the run time and you may need to customize your timeout values to compensate.

Edit: Perhaps we can add a boolean argument to the group resource to indicate that members are unmanaged. This might work in principle but I will have to test it out to be sure.

manicminer avatar Mar 14 '22 09:03 manicminer

Thanks for the response. If we're just trying to assess/manage ownership, graph has a separate endpoint for that.

https://docs.microsoft.com/en-us/graph/api/group-list-owners?view=graph-rest-1.0&tabs=http

I do see in that article that SP's are not returned when querying group ownership - in this case, the only owner configured was in fact a SP (and not a user). Is tf only attempting to pull membership when ownership returns nil? Perhaps it would be as simple as declaring a user as an owner upon group creation...

josefismael avatar Mar 14 '22 13:03 josefismael

@josefismael After re-reading your report it looks like the error arises when querying the group members, I don't believe it is related to owners in this case.

When reading the group details, we iterate the paged group/id/owners and group/id/members endpoints and I believe it's the latter which is giving rise to the error you're seeing, as this currently requires that every single page returns a valid response in order to work. I'll look at adding some resilience here so that these pages are retried, as well as the feasibility of adding an option to not manage the members.

Regarding ownership, the groups API and its docs are a moving target, and alas frequently do not align, which is why we have a lengthy recommendation on group ownership in the provider docs.

manicminer avatar Mar 14 '22 14:03 manicminer

Thanks for the explanation. Ultimately, all we would be looking for is a way to skip the enumeration of the members when evaluating the azuread_group resource (since we're not leveraging the azuread_group_member resource).

josefismael avatar Mar 14 '22 21:03 josefismael