steampipe-postgres-fdw
steampipe-postgres-fdw copied to clipboard
Incorrect results when joining two tables with multiple join conditions
While querying by joining two tables with multiple join conditions(more than 2 in my case), I observed that one of the conditions was being ignored hence giving incorrect results
To Reproduce
- Setup an AWS aggregator connection
- For example When joining the tables
aws_dax_parameter_groupandaws_dax_clusteronparameter_group_name,account_idandregion. Thep.account_id = c.account_idcondition is being ignored. Thus returning results from multiple accounts
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
and p.region = c.region
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 123456789012 | 111122223333 | us-east-1 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+-----------+------------+
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=12.747..12.866 rows=2 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.364..7.380 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=4.604..4.707 rows=2 loops=1) |
| Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 3 |
| Planning Time: 8.682 ms |
| Execution Time: 14.922 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- The query plan above shows that the filter for
p.account_id = c.account_idis ignored. - On removing one of the join conditions from the same query, for eg.
p.region = c.regionbelow, the result displayed are correct
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+----------------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+----------------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | ap-south-1 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-west-2 | us-east-1 |
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+----------------+------------+
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=25000000 width=192) (actual time=12.474..12.939 rows=13 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=4.827..4.840 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=7.004..7.454 rows=13 loops=1) |
| Filter: ((account_id = c.account_id) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 16 |
| Planning Time: 10.968 ms |
| Execution Time: 16.336 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- So does the plan show that all the join conditions were considered.
- Now on removing a different condition from the same query, for eg.
c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_namebelow, the result displayed are correct again
> select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p
join aws_dax_cluster as c
on p.account_id = c.account_id
and p.region = c.region
where
c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id | title | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333 | 111122223333 | us-east-1 | us-east-1 |
+----------------+----------------+---------------+----------------+-----------+------------+
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..70057540002500.00 rows=25000000 width=192) (actual time=17.646..17.691 rows=1 loops=1) |
| Join Filter: ((p.account_id = c.account_id) AND (p.region = c.region)) |
| Rows Removed by Join Filter: 28 |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=3.382..4.065 rows=29 loops=1) |
| -> Materialize (cost=0.00..30000000045040.00 rows=1000000 width=300) (actual time=0.468..0.469 rows=1 loops=29) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..30000000000000.00 rows=1000000 width=300) (actual time=13.549..13.587 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| Planning Time: 11.774 ms |
| Execution Time: 21.489 ms |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
- So does the plan show that all the join conditions were considered.
- Same query with cross-join and the join conditions in the
whereclause, generates a similar plan
> explain analyse select
p.parameter_group_name as id,
p.title as title,
p.account_id as pg_account_id,
c.account_id as dax_account_id,
p.region as pg_region,
c.region as dax_region
from
aws_dax_parameter_group as p,
aws_dax_cluster as c
where
c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
and p.account_id = c.account_id
and p.region = c.region
and c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=11.617..11.679 rows=2 loops=1) |
| -> Foreign Scan on aws_dax_cluster c (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=6.915..6.929 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text) |
| -> Foreign Scan on aws_dax_parameter_group p (cost=0.00..40000.00 rows=100 width=400) (actual time=4.235..4.283 rows=2 loops=1) |
| Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name)) |
| Rows Removed by Filter: 3 |
| Planning Time: 13.727 ms |
| Execution Time: 13.760 ms |
+------------------------------------------------------------------------------------------------------------------------------------------+
- Similar example
> explain analyse select
b.arn as eventbridge_bus_arn
from
aws_eventbridge_rule r
join aws_eventbridge_bus b on r.event_bus_name = b.name
and r.region = b.region
and r.account_id = b.account_id
where
r.arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'
+------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop (cost=0.00..40040001000000.00 rows=125000 width=32) (actual time=15.128..25.383 rows=2 loops=1) |
| -> Foreign Scan on aws_eventbridge_rule r (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.076..17.280 rows=1 loops=1) |
| Filter: (arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'::text) |
| Rows Removed by Filter: 437 |
| -> Foreign Scan on aws_eventbridge_bus b (cost=0.00..40000.00 rows=100 width=400) (actual time=7.371..7.419 rows=2 loops=1) |
| Filter: ((r.region = region) AND (r.event_bus_name = name)) |
| Planning Time: 11.155 ms |
| Execution Time: 28.488 ms |
+------------------------------------------------------------------------------------------------------------------------------------------------+
Steampipe version
Tested with both 0.18.0 and 0.17.4
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
This issue was closed because it has been stalled for 90 days with no activity.