steampipe-plugin-aws
steampipe-plugin-aws copied to clipboard
aws_ecr_repository queries hitting api limits
Describe the bug
queries will timeout with an api limit error (Error running query: rpc error: code = Unknown desc = ThrottlingException: Rate exceeded status code: 400, request id: ....
) when querying using repository_name in ( select distinct repository_name from aws_ecr_repository limit 1)
or similar (using a subselect). However, if the query doesn't use a subselect like `repository_name in ('a/b','b/c',.....) it works fine
Steampipe version (steampipe -v
)
Example: v0.15.1
Plugin version (steampipe plugin list
)
Example: v0.71.0
To reproduce
select
deets -> 'ImageTags' as image_tags,
deets -> 'ImageDigest' as image_digest,
deets -> 'ImagePushedAt' as image_pushedat,
deets -> 'LastRecordedPullTime' as image_lastpulled,
ecr.repository_name as repository_name,
fin -> 'Uri' as uri,
fin -> 'Name' as cve,
fin -> 'Severity' as severity,
fin -> 'Description' as description,
fin -> 'Attributes' as attributes
from
aws_cloudhub.aws_ecr_repository as ecr,
jsonb_array_elements(image_details) as deets,
jsonb_array_elements(image_scanning_findings) as details,
jsonb(details) as detail,
jsonb_array_elements(detail -> 'ImageScanFindings' -> 'Findings') as fin
where
ecr.repository_name in (select distinct repository_name from aws_ecr_repository limit 1)
and fin ->> 'Severity' in ('CRITICAL', 'UNDEFINED')
and deets -> 'ImageDigest' = detail -> 'ImageId' -> 'ImageDigest'
Expected behavior subselect in quals should not hit api limits
@tinder-tder, sorry for the problem you faced with the aws_ecr_repository table query. We have introduced a functionality where you can configure the retry limit and minimum retry delay in the config file. Could you please try out the same and let us know if the problem persists.
@bigdatasourav that is a slow work around, there seems to be something in the query processor that is making it behave differently when using a subselect. it should use the api more efficiently
Hey @tinder-tder , can you please describe what your query is meant to retrieve for ECR repositories and why the subquery is crucial? I read your query, but my SQL skills need a bit of work, so I need some help understanding the query more.
Also, approximately how many repos do you have across the region(s) you're querying?
@cbruno10 the query lists out ECR findings that are critical/undefined for each repo. The sybquery make it easier to not have to explicitly list each repo (ie if a new one is made it can automatically be included in the query via subselect). we are only querying 1 region and with a current list of 11 repos explicit defined (again if we use a subselect to get the same list, even with a limit of 1 it will hit the api limit)
'This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 30 days.'
Hi @tinder-tder !
I encountered the same issue with my repositories. In my own, i think the main trouble comes from the number of tags in my repositories, and from the corresponding findings. It represents a very big payload, and the most part of the response is not interesting (i was looking for findings for my on production images, not the older ones)
I wrote a draft in this pr https://github.com/turbot/steampipe-plugin-aws/pull/1315 and i'm able to work with when i have a lot of images. Maybe it can do the job for you.
@aminvielledebatAtBedrock thanks for the update, I will keep an eye on your pr!
@tinder-tder The image findings have been separated out into a new table, aws_ecr_image_scan_finding
as part of https://github.com/turbot/steampipe-plugin-aws/pull/1315, which was released in AWS plugin v0.80.0.
After using this table, if you're still seeing throttling issues, or any other issues, feel free to re-open this issue.
@cbruno10 Thanks will close this for now.