steampipe-postgres-fdw
steampipe-postgres-fdw copied to clipboard
Failing to apply where condition when joining `aws_ecr_image`
Describe the bug
I'm seeing unexpected results using a WHERE clause on the aws_ecr_image table. All rows from a JOIN condition are being returned even though the query specifies most rows should be excluded by a WHERE condition.
This reproduction case involves joining a second table to provide a target label string for evaluating whether the target is in the JSONB list of aws_ecr_image.image_tags.
Insightfully, I found the problem does not occur qhen querying local-table-copy of the aws_ecr_image table. This leads me to suspect it's a foreign-data-wrapper bug. But I'm not confident in which project the bug exists ( steampipe-postgres-fdw, steampipe-plugin-aws, or steampipe?).
Steampipe version v0.21.4
Plugin version (steampipe plugin list)
turbot/aws 0.132.0
To reproduce
- Have AWS ECR image data in
aws_ecr_image - Join a second table on ecr.repository_name` to provide data for specific repository and image_tag sets.
- Test for presence of specific
image_taginimage_tagsusing the>@JSONB inclusion test.
@>: Checks if a JSON document contains another JSON document.
Example query
with id_tag_repo as -- DEBUG: this is a stand-in for AWS container task definitions
(
select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
union
select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
union
select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
)
select
id_tag_repo.image_tag as target_tag
, id
, ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
, repository_name
, image_tags
from aws_ecr_image ecr
-- from ecr_copy ecr -- DEBUG
join id_tag_repo
on ecr.repository_name = id_tag_repo.image_repository
where
ecr.image_tags @> image_tag
- Note the results of
is_tag_in_tagscolumn may show true, false, null, but the where condition should limit it to only true.
Buggy Results:
Expected behavior The expected behavior can be easily demonstrated by running the same query on a local copy of the a foreign-data-wrapper table.
- Copy to a local table
--drop table ecr_copy
create table ecr_copy as select * from aws_ecr_image
- Re-running the above query, but targeting
from ecr_copyrather thanfrom aws_ecr_image - Notice that only results where
is_tag_in_tagscolumn shows true.
Expected results:
Additional context
I found it difficult to discover how I might locally rebuild steampipe-ftw or the project's postgres container to investigate if this could be a known postgres bug.
@kaidaguerre, Could you please look into this issue?
Sorry for the delay @gabe-gfm , sure we'll look into this 👍
Perhaps this is related? https://github.com/turbot/steampipe-postgres-fdw/pull/440
@jreyesr @kaidaguerre
Unfortunately, this was not resolved using the most aws plugin + postgres foreign data wrapper (us-docker.pkg.dev/steampipe/steampipe/fdw:1.11.0-rc.6). It still returns all rows when attempting to filter for a list of values inside the jsonb image_tags array.
I did notice a particular log surface in my SQL client (not steampipe query):
2024-04-29 17:22:28 fdwGetForeignPlan - best_path->fdw_private is NULL. Defaulting to setting canPushdownAllSortFields to true
From https://github.com/turbot/steampipe-postgres-fdw/blob/41cf90e604f499f43723a01fc64e640b76d32bab/fdw/fdw.c#L328
I identified a better workaround for anyone else encountering this.
You may specify MATERIALIZED (docs) in a Postgres query to prevent the demonstrated incorrect query filter results.
MATERIALIZED is an optimization hint resulting in explicit fetching of a common table express (CTE) result without trying to push down potential optimizations into the query plan.
This example query gave me correctly filtered results.
-- id_tag_repo is a mock for identifying interesting AWS container task definitions linked to ECR images
with id_tag_repo as
(
select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
union
select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
union
select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
),
ecr_info as MATERIALIZED -- MATERIALIZED used to work around steampipe-fdw bug
(
select repository_name, image_tags
from aws_ecr_image
)
select
id_tag_repo.image_tag as target_tag
, id
, ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
, repository_name
, image_tags
from ecr_info ecr
join id_tag_repo
on ecr.repository_name = id_tag_repo.image_repository
where
ecr.image_tags @> image_tag
@gabe-gfm good that you've found a workaround. I need to dig into why the JSON join condition is causing this issue