terraform-provider-okta icon indicating copy to clipboard operation
terraform-provider-okta copied to clipboard

okta_app_saml and okta_app_oauth cause rate limit/timeout errors during plan phase

Open kostacasa opened this issue 2 years ago • 5 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

TF version 1.1.3

Affected Resource(s)

  • okta_app_saml
  • okta_app_oauth

Terraform Configuration Files

resource "okta_app_oauth" "mobile" {
  label          = "Mobile API Client"
  type           = "native"

  response_types = [ "code" ]
  grant_types    = [
    "refresh_token",
    "authorization_code"]

  redirect_uris  = local.api_client_callback_urls
  token_endpoint_auth_method = "none" 
  refresh_token_rotation = "STATIC"

  # See https://registry.terraform.io/providers/okta/okta/latest/docs/resources/app_group_assignment
  # When using this resource in conjunction with other application resources (e.g. okta_app_oauth) it is advisable to add the following lifecycle argument to the associated app_* resources to prevent the groups being unassigned on subsequent runs:
  lifecycle {
     ignore_changes = [groups]
  }
}

resource "okta_app_saml" "saml_app" {
  label                    = "SAMLApplication"
  sso_url                  = local.env.xxx
  recipient                = local.env.xxx
  destination              = local.env.xxx
  audience                 = local.env.xxx
  assertion_signed         = true
  response_signed          = true
  signature_algorithm      = "xxx"
  digest_algorithm         = "xxx"
  honor_force_authn        = true
  idp_issuer               = local.env.saml_issuer
  authn_context_class_ref  = "urn:oasis:names:tc:SAML:2.0:xxxx"
  subject_name_id_template = "xxx"
  subject_name_id_format   = "xxx"

  # See https://registry.terraform.io/providers/okta/okta/latest/docs/resources/app_group_assignment
  # When using this resource in conjunction with other application resources (e.g. okta_app_oauth) it is advisable to add the following lifecycle argument to the associated app_* resources to prevent the groups being unassigned on subsequent runs:
  lifecycle {
     ignore_changes = [groups]
  }
}

Debug Output

https://gist.github.com/kostacasa/af28c7f01ece535ffc66f5bcd86a419c

Expected Behavior

Apps should be updated without rate limits being hit (that ultimately cause timeouts).

Actual Behavior

Our org hit rate limits during the plan phase as shown below: okta_rate_limit

Steps to Reproduce

Running tf plan is enough.

Regression in Okta Terraform Provider seems to have been introduced between versions 3.12.1 and 3.13.1. The former runs a plan on our org successfully, the latter (including all versions up to latest) cause timeouts during the plan phase.

We also attempted to utilize the max_api_capacity parameter which prevented the rate limit from occurring, but the plan phase still timed out after 15 minutes.

I would draw attention to the URL that is in the debug output gist which shows which URLs are timing out: Get "https://xxx.okta.com/api/v1/apps/xxx/users?after=xxx&limit=200"

Key part bolded - the xxx part is an actual user ID from our org. It looks like the provider is attempting to paginate through the entire user base to perform a diff, 200 at a time. Since our org has tens of thousands of users, it appears that this takes too long (and causes rate limits).

We tried adding following properties to the resource:

  skip_users = true
  skip_groups = true

And following ignores to lifecycle:

ignore_changes = [groups, users]

But neither avoided the problem.

Important Factoids

These apps have close to 100k users assigned to them which I suspect matters. Our sandboxes with smaller user numbers do not experience this issue.

References

  • #0000

kostacasa avatar Mar 10 '22 02:03 kostacasa

@kostacasa thanks for all the details. I will see if we can address this in the next release.

monde avatar Mar 10 '22 16:03 monde

+1 A lot of requests done on these endpoints: "current request "GET /api/v1/apps/<app_id>/users"

My use-case is only using a datasource for okta_app_oauth or okta_app_saml to retrieve the app_id without any interaction with its user base

Cylock avatar Mar 24 '22 09:03 Cylock

+1 on this as well.

When looking at the TRACE logs, I'm seeing some of these:

2022-03-31T13:46:27.391-0700 [DEBUG] provider.terraform-provider-okta_v3.22.1: 2022/03/31 01:46:27 [INFO] Throttling API requests; sleeping for 0 seconds until rate limit reset (path class "app-id": 96 remaining of 500 total); current request "GET /api/v1/apps/0oaxxxx/users/00xxxxxxx"

But looking at the rate limit dashboard, seems Okta is bucketing these all under v1/apps*. v1/apps/{id} wasn't even really touched.

image

tantran-falconx avatar Mar 31 '22 21:03 tantran-falconx

Thanks for the details @tantran-falconx this is very helpful for my investigation.

monde avatar Mar 31 '22 23:03 monde

Any updates on this issue? We are still unable to upgrade to the latest versions of the provider because of this. Our prod org always times out during plan phase.

kostacasa avatar Jun 07 '22 21:06 kostacasa

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jan 22 '23 00:01 github-actions[bot]

Please keep open

antonmos avatar Jan 22 '23 21:01 antonmos

@kostacasa @antonmos are you still having an issue here. A while back I refactored the rate limiting algorithm to be driven off of real accounting in the Okta monolith's integration tests. I haven't heard much from anyone having rate limiting issues any longer.

monde avatar Mar 11 '23 00:03 monde

Not happening any more for me! Thank you for fixing!

antonmos avatar Mar 11 '23 15:03 antonmos

We'll call this one done

monde avatar Mar 11 '23 16:03 monde