steampipe-postgres-fdw icon indicating copy to clipboard operation
steampipe-postgres-fdw copied to clipboard

Incorrect results when joining two tables with multiple join conditions

Open karanpopat opened this issue 2 years ago • 1 comments

While querying by joining two tables with multiple join conditions(more than 2 in my case), I observed that one of the conditions was being ignored hence giving incorrect results

To Reproduce

  1. Setup an AWS aggregator connection
  2. For example When joining the tables aws_dax_parameter_group and aws_dax_cluster on parameter_group_name, account_id and region. The p.account_id = c.account_id condition is being ignored. Thus returning results from multiple accounts
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 123456789012  | 111122223333   | us-east-1 | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=12.747..12.866 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.364..7.380 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.604..4.707 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 8.682 ms                                                                                                                  |
| Execution Time: 14.922 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. The query plan above shows that the filter for p.account_id = c.account_id is ignored.
  2. On removing one of the join conditions from the same query, for eg. p.region = c.region below, the result displayed are correct
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
    and p.account_id = c.account_id
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+----------------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region      | dax_region |
+----------------+----------------+---------------+----------------+----------------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | ap-south-1     | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-west-2      | us-east-1  |
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1      | us-east-1  |
+----------------+----------------+---------------+----------------+----------------+------------+

+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=25000000 width=192) (actual time=12.474..12.939 rows=13 loops=1)                         |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=4.827..4.840 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.004..7.454 rows=13 loops=1)     |
|         Filter: ((account_id = c.account_id) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))            |
|         Rows Removed by Filter: 16                                                                                                       |
| Planning Time: 10.968 ms                                                                                                                 |
| Execution Time: 16.336 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Now on removing a different condition from the same query, for eg. c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name below, the result displayed are correct again
> select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p
  join aws_dax_cluster as c
    on p.account_id = c.account_id
    and p.region = c.region
where
  c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+----------------+----------------+---------------+----------------+-----------+------------+
| id             | title          | pg_account_id | dax_account_id | pg_region | dax_region |
+----------------+----------------+---------------+----------------+-----------+------------+
| default.dax1.0 | default.dax1.0 | 111122223333  | 111122223333   | us-east-1 | us-east-1  |
+----------------+----------------+---------------+----------------+-----------+------------+

+---------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..70057540002500.00 rows=25000000 width=192) (actual time=17.646..17.691 rows=1 loops=1)                                   |
|   Join Filter: ((p.account_id = c.account_id) AND (p.region = c.region))                                                                          |
|   Rows Removed by Join Filter: 28                                                                                                                 |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=3.382..4.065 rows=29 loops=1) |
|   ->  Materialize  (cost=0.00..30000000045040.00 rows=1000000 width=300) (actual time=0.468..0.469 rows=1 loops=29)                               |
|         ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..30000000000000.00 rows=1000000 width=300) (actual time=13.549..13.587 rows=1 loops=1)  |
|               Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                     |
| Planning Time: 11.774 ms                                                                                                                          |
| Execution Time: 21.489 ms                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------+
  1. So does the plan show that all the join conditions were considered.
  2. Same query with cross-join and the join conditions in the where clause, generates a similar plan
> explain analyse select
  p.parameter_group_name as id,
  p.title as title,
  p.account_id as pg_account_id,
  c.account_id as dax_account_id,
  p.region as pg_region,
  c.region as dax_region
from
  aws_dax_parameter_group as p,
  aws_dax_cluster as c
where
  c.parameter_group ->> 'ParameterGroupName' = p.parameter_group_name
  and p.account_id = c.account_id
  and p.region = c.region
  and c.arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'
+------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                               |
+------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=192) (actual time=11.617..11.679 rows=2 loops=1)                            |
|   ->  Foreign Scan on aws_dax_cluster c  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=6.915..6.929 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:dax:us-east-1:111122223333:cache/graph-test-dax'::text)                                                  |
|   ->  Foreign Scan on aws_dax_parameter_group p  (cost=0.00..40000.00 rows=100 width=400) (actual time=4.235..4.283 rows=2 loops=1)      |
|         Filter: ((region = c.region) AND ((c.parameter_group ->> 'ParameterGroupName'::text) = parameter_group_name))                    |
|         Rows Removed by Filter: 3                                                                                                        |
| Planning Time: 13.727 ms                                                                                                                 |
| Execution Time: 13.760 ms                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------+
  1. Similar example
> explain analyse select
  b.arn as eventbridge_bus_arn
from
  aws_eventbridge_rule r
  join aws_eventbridge_bus b on r.event_bus_name = b.name
    and r.region = b.region
    and r.account_id = b.account_id
where
  r.arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'
+------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                     |
+------------------------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=0.00..40040001000000.00 rows=125000 width=32) (actual time=15.128..25.383 rows=2 loops=1)                                   |
|   ->  Foreign Scan on aws_eventbridge_rule r  (cost=0.00..40000000000000.00 rows=1000000 width=400) (actual time=7.076..17.280 rows=1 loops=1) |
|         Filter: (arn = 'arn:aws:events:ap-south-1:111122223333:rule/test'::text)                                                               |
|         Rows Removed by Filter: 437                                                                                                            |
|   ->  Foreign Scan on aws_eventbridge_bus b  (cost=0.00..40000.00 rows=100 width=400) (actual time=7.371..7.419 rows=2 loops=1)                |
|         Filter: ((r.region = region) AND (r.event_bus_name = name))                                                                            |
| Planning Time: 11.155 ms                                                                                                                       |
| Execution Time: 28.488 ms                                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------+

Steampipe version Tested with both 0.18.0 and 0.17.4

karanpopat avatar Jan 12 '23 14:01 karanpopat