workload-discovery-on-aws icon indicating copy to clipboard operation
workload-discovery-on-aws copied to clipboard

qKe Request failed with status code 403

Open sjribe opened this issue 1 year ago • 22 comments

If your issue relates to the Discovery Process, please first follow the steps described in the implementation guide Debugging the Discovery Component


Describe the bug when clicking on resources I get the error qKe Request failed with status code 403

To Reproduce Steps to reproduce the behavior:

  1. when logged in as admin click on resources under explore
  2. error message will appear on top
  3. no resources are discovered

Expected behavior resources listed

Screenshots image

Browser (please complete the following information):

reproducible on latest versions of edge and chrome

Additional context Add any other context about the problem here.

sjribe avatar Jan 17 '24 08:01 sjribe

Open up your browser dev tools and paste any errors you see there into this issue.

svozza avatar Jan 17 '24 09:01 svozza

{ "errors" : [ { "errorType" : "WAFForbiddenException", "message" : "403 Forbidden" } ] Oh, I think I know now. Where it says "Comma separated list of CIDR ranges to manage access the API. To allow all the entire internet, use 0.0.0.0/1,128.0.0.0/1" what they mean is you should allow the internet because it needs to use the internet? If that's true what's the best way to go about fixing this without having to redo the whole thing?

sjribe avatar Jan 17 '24 12:01 sjribe

Yeah, because the Fargate task speaks to AppSync, it needs to access the internet. If you just update the CFN stack and change that parameter back to 0.0.0.0/1,128.0.0.0/1, it will update it and everything will work.

svozza avatar Jan 17 '24 12:01 svozza

Yea, easy enough. Thanks.

Error's gone but now no resources discovered... different problem I guess...

sjribe avatar Jan 17 '24 12:01 sjribe

The discovery task runs every 15 minutes, so won't run for another 5 minutes (assuming you've deployed the CloudFormation to the various accounts you want to import).

svozza avatar Jan 17 '24 12:01 svozza

Running it as CrossAccountDiscovery set to AWS_ORGANIZATIONS. So maybe I have the wrong OrganizationUnitId. I used the r- value for the root OU but should it be the o- value of the organization? image

sjribe avatar Jan 17 '24 12:01 sjribe

No, the r value will work. Check the ECS logs (don't worry about lambda) for any errors, instructions at think link: https://aws-solutions.github.io/workload-discovery-on-aws/workload-discovery-on-aws/2.0/debugging-the-discovery-component.html.

svozza avatar Jan 17 '24 12:01 svozza

Thanks. It was the r value and it did discover some resources however going through the debugging I'm getting quite a lot (22 per discovery) of: { "error": { "name": "TooManyRequestsException", "$fault": "client", "$metadata": { "httpStatusCode": 429, "requestId": "c36e55ed-eb1c-43e9-8415-b1826ee017e0", "attempts": 4, "totalRetryDelay": 1856 }, "retryAfterSeconds": null }, "level": "error", "message": "Error discovering API Gateway integration for resource: arn:aws:apigateway:us-west-2::/restapis/fqsoha0aq2/resources/89lfy2", "timestamp": "2024-01-17T23:31:05.569Z" }

I'm also getting 1: { "message": "Access denied assuming role: arn:aws:iam::922409771208:role/WorkloadDiscoveryRole-922409771208. This is the management account, ensure the global resources template has been deployed to the account.", "level": "error", "timestamp": "2024-01-17T23:30:37.747Z" } But it is true I haven't deployed the global resources template

sjribe avatar Jan 17 '24 23:01 sjribe

So some additional information:

  1. In our Audit account (682880543195), resource explorer shows 514 resources.

  2. In our Org account, resource explorer filter to the Audit account it shows 513 resources.

  3. In our Org account, Config Aggregators shows OK for 682880543195 and all the Regions show as OK. image image

  4. But in Config Aggregators Resources filtered to 682880543195 shows no resources. image

So it seems like it's connecting fine but it's not discovering anything there. And maybe there's a security option in the account 682880543195 limiting API calls? But I'm not sure where I would look for that.

sjribe avatar Jan 18 '24 02:01 sjribe

In AWS_ORGANIZATIONS mode, Workload Discovery does not manage enablement of Config. We leave that down to customers as managing deployment of Config is different for every organization based on what they want to monitor and potential costs incurred by enabling it across a large number of accounts and regions. If one of your accounts doesn't have resources in it then it means Config is either not enabled in any regions in that account or as you mentioned, there is some permission error or SCP that is preventing it from doing so.

svozza avatar Jan 18 '24 08:01 svozza

Thanks. It was the r value and it did discover some resources however going through the debugging I'm getting quite a lot (22 per discovery) of: { "error": { "name": "TooManyRequestsException", "$fault": "client", "$metadata": { "httpStatusCode": 429, "requestId": "c36e55ed-eb1c-43e9-8415-b1826ee017e0", "attempts": 4, "totalRetryDelay": 1856 }, "retryAfterSeconds": null }, "level": "error", "message": "Error discovering API Gateway integration for resource: arn:aws:apigateway:us-west-2::/restapis/fqsoha0aq2/resources/89lfy2", "timestamp": "2024-01-17T23:31:05.569Z" }

I'm also getting 1: { "message": "Access denied assuming role: arn:aws:iam::922409771208:role/WorkloadDiscoveryRole-922409771208. This is the management account, ensure the global resources template has been deployed to the account.", "level": "error", "timestamp": "2024-01-17T23:30:37.747Z" } But it is true I haven't deployed the global resources template

The API errors are because the the discovery process is being rate limited when it makes SDK calls to the API gateway SDK. API Gateway limits are account wide (rather than regional) so it there a large number of API gateway resources in an account, these sorts of throttling errors are unavoidable.

The IAM error you are seeing is because of the way organization wide StackSets work: they do not allow you to deploy a stack instance to the management account. In AWS_ORGANIZATIONS mode, the deployment process uses StakcSets to deploy the global resources stack on your behalf in all the accounts in your organization. There should be an error dialog box on the Accounts page the Workload Discovery UI that has a link to the template that you can manually deploy in the management account using CloudFormation.

svozza avatar Jan 18 '24 08:01 svozza

The API errors are because the the discovery process is being rate limited when it makes SDK calls to the API gateway SDK. API Gateway limits are account wide (rather than regional) so it there a large number of API gateway resources in an account, these sorts of throttling errors are unavoidable.

Is this something that AWS support can temporarily increase or lift? It looks like it's stopping at the same point each time so it's not discovering new resources. Alternatively, if I add each account in manually can I stagger the discovery for each account so as to not trigger the throttle?

The IAM error you are seeing is because of the way organization wide StackSets work: they do not allow you to deploy a stack instance to the management account. In AWS_ORGANIZATIONS mode, the deployment process uses StakcSets to deploy the global resources stack on your behalf in all the accounts in your organization. There should be an error dialog box on the Accounts page the Workload Discovery UI that has a link to the template that you can manually deploy in the management account using CloudFormation.

I installed the template and so that's sorted now.

sjribe avatar Jan 18 '24 09:01 sjribe

Is this something that AWS support can temporarily increase or lift? It looks like it's stopping at the same point each time so it's not discovering new resources.

Do you mean the discovery process is crashing? Those throttling errors should only affect API Gateway, they should be skipped over and the process should move on to the next set of resources. Can you attach the ECS logs here so I can have a look?

svozza avatar Jan 18 '24 10:01 svozza

I don't know if the process is crashing but I do know not all of my resources are being discovered. In the account mentioned before each region shows "Not Discovered" but I know that account has 514 resources across 18 regions according to resource explorer. Or are there default resources in each region and the discovery process is filtering them out? I've attached the ECS logs for the most recent discovery job. log-events-viewer-result.csv

sjribe avatar Jan 18 '24 23:01 sjribe

The discovery process in not crashing but It looks like there are only 1734 resources in the entire aggregator, that seems very low for an organization wide aggregator. When you say 'resource explorer', do you mean the service or do you mean the resource section in the AWS Config console page? Can you go to the aggregator that WD deployed (it will be called aws-perspective-<wd-region>-<wd-account-id>-aggregator and run the following query in the advanced queries section:

SELECT * WHERE accountId = '<account-id-with-514 resources'

Make sure the query scope is the aggregator as per the screenshot: Screenshot 2024-01-18 at 23 29 50

What results do you see when you run the query?

svozza avatar Jan 18 '24 23:01 svozza

Yes, the service AWS resource explorer. This is viewing the account 682880543195 image

Looks like it has no output. image

sjribe avatar Jan 18 '24 23:01 sjribe

The results of the SQL query means it looks like the issue is that AWS Config is not enabled in any regions in that account. Try enabling it in us-east-1 of 682880543195 and you should see IAM roles and and a few other global resource types when you run that query again (note that it can take several minutes for Config to find the resources after enablement).

If Config doesn't know about a resource there's no way for WD to discover it as we get 90% of our resources from their APIs (under the hood we also use the SQL syntax you are using there for your ad hoc query).

svozza avatar Jan 19 '24 09:01 svozza

Thanks. That's showing up now. Does AWS Config need to be enabled in every region in use or only one per account? For 682880543195 us-east-1 and ap-southeast-2 are in use.

sjribe avatar Feb 14 '24 03:02 sjribe

Yeah, it needs to be enabled in each region you're interesting in.

svozza avatar Feb 14 '24 10:02 svozza

Great. That's solved most of my problems! I do have one account (949247560096) which I've enabled config on all 17 regions enabled on that account. However the discovery is only resources in 3 regions and the other regions it's saying "Not Discovered" like when config was not enabled in that region. Do you know why that would be?

sjribe avatar Feb 16 '24 01:02 sjribe

That's strange. Are there any errors in the discovery process logs?

svozza avatar Feb 16 '24 09:02 svozza

I think I've sorted it. I did find out that Config was not enabled on the other regions but that the admin account for some reason can't add it to those regions. I've also realized there's only the default stuff in those regions without config so at the moment not necessary.

Is there a way to filter out the default resources?

sjribe avatar Feb 29 '24 00:02 sjribe