steampipe-postgres-fdw icon indicating copy to clipboard operation
steampipe-postgres-fdw copied to clipboard

Failing to apply where condition when joining `aws_ecr_image`

Open gabe-gfm opened this issue 1 year ago • 7 comments

Describe the bug I'm seeing unexpected results using a WHERE clause on the aws_ecr_image table. All rows from a JOIN condition are being returned even though the query specifies most rows should be excluded by a WHERE condition.

This reproduction case involves joining a second table to provide a target label string for evaluating whether the target is in the JSONB list of aws_ecr_image.image_tags.

Insightfully, I found the problem does not occur qhen querying local-table-copy of the aws_ecr_image table. This leads me to suspect it's a foreign-data-wrapper bug. But I'm not confident in which project the bug exists ( steampipe-postgres-fdw, steampipe-plugin-aws, or steampipe?).

Steampipe version v0.21.4

Plugin version (steampipe plugin list) turbot/aws 0.132.0

To reproduce

  1. Have AWS ECR image data in aws_ecr_image
  2. Join a second table on ecr.repository_name` to provide data for specific repository and image_tag sets.
  3. Test for presence of specific image_tag in image_tags using the >@ JSONB inclusion test.

@>: Checks if a JSON document contains another JSON document.

Example query

with id_tag_repo as -- DEBUG: this is a stand-in for AWS container task definitions
(
  select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
  union
  select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
  union
  select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
)
select 
  id_tag_repo.image_tag as target_tag
  , id
  , ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
  , repository_name
  , image_tags
from aws_ecr_image ecr
-- from ecr_copy ecr -- DEBUG
join id_tag_repo
  on ecr.repository_name = id_tag_repo.image_repository
where
  ecr.image_tags @> image_tag
  1. Note the results of is_tag_in_tags column may show true, false, null, but the where condition should limit it to only true.

Buggy Results: image

Expected behavior The expected behavior can be easily demonstrated by running the same query on a local copy of the a foreign-data-wrapper table.

  1. Copy to a local table
--drop table ecr_copy
create table ecr_copy as select * from aws_ecr_image
  1. Re-running the above query, but targeting from ecr_copy rather than from aws_ecr_image
  2. Notice that only results where is_tag_in_tags column shows true.

Expected results: image

Additional context I found it difficult to discover how I might locally rebuild steampipe-ftw or the project's postgres container to investigate if this could be a known postgres bug.

gabe-gfm avatar Mar 13 '24 02:03 gabe-gfm

@kaidaguerre, Could you please look into this issue?

bigdatasourav avatar Apr 01 '24 06:04 bigdatasourav

Sorry for the delay @gabe-gfm , sure we'll look into this 👍

kaidaguerre avatar Apr 09 '24 15:04 kaidaguerre

Perhaps this is related? https://github.com/turbot/steampipe-postgres-fdw/pull/440

gabe-gfm avatar Apr 17 '24 01:04 gabe-gfm

@jreyesr @kaidaguerre

Unfortunately, this was not resolved using the most aws plugin + postgres foreign data wrapper (us-docker.pkg.dev/steampipe/steampipe/fdw:1.11.0-rc.6). It still returns all rows when attempting to filter for a list of values inside the jsonb image_tags array.

I did notice a particular log surface in my SQL client (not steampipe query):

2024-04-29 17:22:28 fdwGetForeignPlan - best_path->fdw_private is NULL. Defaulting to setting canPushdownAllSortFields to true

From https://github.com/turbot/steampipe-postgres-fdw/blob/41cf90e604f499f43723a01fc64e640b76d32bab/fdw/fdw.c#L328

gabe-gfm avatar Apr 30 '24 00:04 gabe-gfm

I identified a better workaround for anyone else encountering this.

You may specify MATERIALIZED (docs) in a Postgres query to prevent the demonstrated incorrect query filter results.

MATERIALIZED is an optimization hint resulting in explicit fetching of a common table express (CTE) result without trying to push down potential optimizations into the query plan.

This example query gave me correctly filtered results.

-- id_tag_repo is a mock for identifying interesting AWS container task definitions linked to ECR images
with id_tag_repo as  
(
  select 1 as id, to_jsonb('ZZZ'::text) as image_tag, 'my-repo-A' as image_repository
  union
  select 2 as id, '"latest"'::jsonb as image_tag, 'my-repo-B' as image_repository
  union
  select 3 as id, '"latest"'::jsonb as image_tag, 'my-repo-A' as image_repository
), 
ecr_info as MATERIALIZED -- MATERIALIZED used to work around steampipe-fdw bug 
(
  select repository_name, image_tags
  from  aws_ecr_image
)
select 
  id_tag_repo.image_tag as target_tag
  , id
  , ecr.image_tags @> image_tag as is_tag_in_tags -- DEBUG: Matches where condition. Expect to see only true rows
  , repository_name
  , image_tags
from ecr_info ecr
join id_tag_repo
  on ecr.repository_name = id_tag_repo.image_repository
where
  ecr.image_tags @> image_tag

gabe-gfm avatar May 03 '24 23:05 gabe-gfm

@gabe-gfm good that you've found a workaround. I need to dig into why the JSON join condition is causing this issue

kaidaguerre avatar May 06 '24 08:05 kaidaguerre