steampipe-plugin-aws icon indicating copy to clipboard operation
steampipe-plugin-aws copied to clipboard

aws_ecr_repository queries hitting api limits

Open tinder-tder opened this issue 1 year ago • 8 comments

Describe the bug queries will timeout with an api limit error (Error running query: rpc error: code = Unknown desc = ThrottlingException: Rate exceeded status code: 400, request id: .... ) when querying using repository_name in ( select distinct repository_name from aws_ecr_repository limit 1) or similar (using a subselect). However, if the query doesn't use a subselect like `repository_name in ('a/b','b/c',.....) it works fine

Steampipe version (steampipe -v) Example: v0.15.1

Plugin version (steampipe plugin list) Example: v0.71.0

To reproduce

  select
    deets -> 'ImageTags' as image_tags,
    deets -> 'ImageDigest' as image_digest,
    deets -> 'ImagePushedAt' as image_pushedat,
    deets -> 'LastRecordedPullTime' as image_lastpulled,
    ecr.repository_name as repository_name,
    fin -> 'Uri' as uri,
    fin -> 'Name' as cve,
    fin -> 'Severity' as severity,
    fin -> 'Description' as description,
    fin -> 'Attributes' as attributes
  from
    aws_cloudhub.aws_ecr_repository as ecr,
    jsonb_array_elements(image_details) as deets,
    jsonb_array_elements(image_scanning_findings) as details,
    jsonb(details) as detail,
    jsonb_array_elements(detail -> 'ImageScanFindings' -> 'Findings') as fin
  where
    ecr.repository_name in (select distinct repository_name from aws_ecr_repository limit 1)
    and fin ->> 'Severity' in ('CRITICAL', 'UNDEFINED')
    and deets -> 'ImageDigest' = detail -> 'ImageId' -> 'ImageDigest'

Expected behavior subselect in quals should not hit api limits

tinder-tder avatar Jul 22 '22 22:07 tinder-tder

@tinder-tder, sorry for the problem you faced with the aws_ecr_repository table query. We have introduced a functionality where you can configure the retry limit and minimum retry delay in the config file. Could you please try out the same and let us know if the problem persists.

bigdatasourav avatar Jul 25 '22 06:07 bigdatasourav

@bigdatasourav that is a slow work around, there seems to be something in the query processor that is making it behave differently when using a subselect. it should use the api more efficiently

tinder-tder avatar Jul 25 '22 18:07 tinder-tder

Hey @tinder-tder , can you please describe what your query is meant to retrieve for ECR repositories and why the subquery is crucial? I read your query, but my SQL skills need a bit of work, so I need some help understanding the query more.

Also, approximately how many repos do you have across the region(s) you're querying?

cbruno10 avatar Jul 25 '22 19:07 cbruno10

@cbruno10 the query lists out ECR findings that are critical/undefined for each repo. The sybquery make it easier to not have to explicitly list each repo (ie if a new one is made it can automatically be included in the query via subselect). we are only querying 1 region and with a current list of 11 repos explicit defined (again if we use a subselect to get the same list, even with a limit of 1 it will hit the api limit)

tinder-tder avatar Jul 28 '22 16:07 tinder-tder

'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

github-actions[bot] avatar Sep 03 '22 23:09 github-actions[bot]

Hi @tinder-tder !

I encountered the same issue with my repositories. In my own, i think the main trouble comes from the number of tags in my repositories, and from the corresponding findings. It represents a very big payload, and the most part of the response is not interesting (i was looking for findings for my on production images, not the older ones)

I wrote a draft in this pr https://github.com/turbot/steampipe-plugin-aws/pull/1315 and i'm able to work with when i have a lot of images. Maybe it can do the job for you.

@aminvielledebatAtBedrock thanks for the update, I will keep an eye on your pr!

tinder-tder avatar Oct 13 '22 16:10 tinder-tder

@tinder-tder The image findings have been separated out into a new table, aws_ecr_image_scan_finding as part of https://github.com/turbot/steampipe-plugin-aws/pull/1315, which was released in AWS plugin v0.80.0.

After using this table, if you're still seeing throttling issues, or any other issues, feel free to re-open this issue.

cbruno10 avatar Oct 21 '22 15:10 cbruno10

@cbruno10 Thanks will close this for now.

tinder-tder avatar Oct 27 '22 20:10 tinder-tder