terraform-provider-azuread
terraform-provider-azuread copied to clipboard
Terraform crashing due to group size in azuread_app_role_assignment
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritise the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform (and AzureAD Provider) Version
tf version: 1.1.7 azuread: 2.18
Affected Resource(s)
azuread_app_role_assignment
Terraform Configuration Files
Not at liberty to disclose full config info, however issue is presenting itself when an azuread_group in an azuread_app_role_assignment has a large number of members (our case there is over 80k). The assignment and the group were created successfully, however now that the group has been populated with all members, this is causing our entire pipeline to crash.
I can see that the provider plugin is trying to do a graph call to get the membership of the group. Is there a way we can ignore this call? We are managing membership via a separate idM solution, so we should not need tf to be authoritative for the membership.
# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key: https://keybase.io/hashicorp
Debug Output
1510│ "https://graph.microsoft.com/beta/***/groups/<groupObjectID>/members?$select=id&$skiptoken=<token>":
Error: Could not retrieve members for group with object ID "<groupObjectID>"
1504│
1507│ 121: owners = [data.azuread_client_config.current.object_id]
1508│
1509│ GroupsClient.BaseClient.Get(): Get
1510│ "https://graph.microsoft.com/beta/***/groups/<groupObjectID>/members?$select=id&$skiptoken=<tokenvalue>":
1511│ http: RoundTripper implementation (*retryablehttp.RoundTripper) returned a
1512│ nil *Response with a nil error
Panic Output
Expected Behavior
Actual Behavior
Steps to Reproduce
terraform apply
Important Factoids
References
- #0000
Hi @josefismael, thanks for reporting this issue. 80K members is quite a lot and I can see how this could easily introduce timeouts or other API related errors (e.g. rate limiting, failed responses) as we attempt to determine the group's membership. Unfortunately it is necessary to do this so that we can assert the members property, and in order to do that we need to know who those members are.
You may have tried to use the lifecycle meta argument to ignore changes to members, however I don't believe this will work as this abstraction happens outside the provider and simply informs Terraform core about your intentions - this setting is not passed down to providers.
I can certainly look at adding greater resilience to the read functions for this resource, to make API errors less likely, ~but I can't think of a way to sidestep membership management altogether I'm afraid~. This might be able to improve your pipeline reliability, though it won't do anything about the run time and you may need to customize your timeout values to compensate.
Edit: Perhaps we can add a boolean argument to the group resource to indicate that members are unmanaged. This might work in principle but I will have to test it out to be sure.
Thanks for the response. If we're just trying to assess/manage ownership, graph has a separate endpoint for that.
https://docs.microsoft.com/en-us/graph/api/group-list-owners?view=graph-rest-1.0&tabs=http
I do see in that article that SP's are not returned when querying group ownership - in this case, the only owner configured was in fact a SP (and not a user). Is tf only attempting to pull membership when ownership returns nil? Perhaps it would be as simple as declaring a user as an owner upon group creation...
@josefismael After re-reading your report it looks like the error arises when querying the group members, I don't believe it is related to owners in this case.
When reading the group details, we iterate the paged group/id/owners and group/id/members endpoints and I believe it's the latter which is giving rise to the error you're seeing, as this currently requires that every single page returns a valid response in order to work. I'll look at adding some resilience here so that these pages are retried, as well as the feasibility of adding an option to not manage the members.
Regarding ownership, the groups API and its docs are a moving target, and alas frequently do not align, which is why we have a lengthy recommendation on group ownership in the provider docs.
Thanks for the explanation. Ultimately, all we would be looking for is a way to skip the enumeration of the members when evaluating the azuread_group resource (since we're not leveraging the azuread_group_member resource).