steampipe-postgres-fdw icon indicating copy to clipboard operation
steampipe-postgres-fdw copied to clipboard

Queries fail when joining tables on data from JSONB columns

Open cbruno10 opened this issue 2 years ago • 3 comments

If I have a table with required or optional quals and try to join that with data from another table, it seems like Steampipe thinks that the quals aren't being passed in.

> select instance_id, launch_template_data ->> 'ImageId' as image_id from aws_ec2_instance
+---------------------+-----------------------+
| instance_id         | image_id              |
+---------------------+-----------------------+
| i-0bd0fba70e1c158c9 | ami-029c0fbe456d58bd1 |
| i-0e31039a58e5b9623 | ami-0b69ea66ff7391e80 |
| i-076c621c9eafe3bd8 | ami-07d0cf3af28718ef8 |
| i-06e0de1b8436a1f50 | ami-07d0cf3af28718ef8 |
+---------------------+-----------------------+

Time: 37ms. Rows fetched: 4 (cached). Hydrate calls: 0.
> select name from aws_ec2_ami_shared where image_id = 'ami-029c0fbe456d58bd1'
+--------------------------------------------+
| name                                       |
+--------------------------------------------+
| RHEL-7.7_HVM-20190923-x86_64-0-Hourly2-GP2 |
+--------------------------------------------+

Time: 1.0s. Rows fetched: 1. Hydrate calls: 0.

> explain select inst.instance_id, inst.image_id, ami.name from aws_ec2_instance as inst left join aws_ec2_ami_shared as ami on inst.launch_template_data ->> 'ImageId' = ami.image_id order by inst.instance_id
+----------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                               |
+----------------------------------------------------------------------------------------------------------+
| Sort  (cost=30042089220313.72..30042101720313.72 rows=5000000000 width=96)                               |
|   Sort Key: inst.instance_id                                                                             |
|   ->  Nested Loop Left Join  (cost=0.00..30040002000000.00 rows=5000000000 width=96)                     |
|         ->  Foreign Scan on aws_ec2_instance inst  (cost=0.00..30000000000000.00 rows=1000000 width=300) |
|         ->  Foreign Scan on aws_ec2_ami_shared ami  (cost=0.00..40000.00 rows=200 width=200)             |
|               Filter: ((inst.launch_template_data ->> 'ImageId'::text) = image_id)                       |
+----------------------------------------------------------------------------------------------------------+

Time: 32ms.
> select inst.instance_id, inst.image_id, ami.name from aws_ec2_instance as inst left join aws_ec2_ami_shared as ami on inst.launch_template_data ->> 'ImageId' = ami.image_id order by inst.instance_id

Error: please provide either owner_id or image_id (SQLSTATE HV000)

+-------------+----------+------+
| instance_id | image_id | name |
+-------------+----------+------+
+-------------+----------+------+

Time: 39ms. Rows fetched: 4 (cached). Hydrate calls: 0.

Compared to running a query that pulls data from a string column:

> select inst.instance_id, inst.image_id, ami.name from aws_ec2_instance as inst left join aws_ec2_ami_shared as ami on inst.image_id = ami.image_id order by inst.instance_id
+---------------------+-----------------------+-------------------------------------------------------------------+
| instance_id         | image_id              | name                                                              |
+---------------------+-----------------------+-------------------------------------------------------------------+
| i-06e0de1b8436a1f50 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
| i-076c621c9eafe3bd8 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
| i-0bd0fba70e1c158c9 | ami-029c0fbe456d58bd1 | RHEL-7.7_HVM-20190923-x86_64-0-Hourly2-GP2                        |
| i-0e31039a58e5b9623 | ami-0b69ea66ff7391e80 | amzn2-ami-hvm-2.0.20190823.1-x86_64-gp2                           |
+---------------------+-----------------------+-------------------------------------------------------------------+

Time: 3.5s. Rows fetched: 8 (1 cached). Hydrate calls: 0.

cbruno10 avatar Sep 13 '23 17:09 cbruno10

A workaround is to use a CTE with order by (just using a CTE doesn't work):

> with instance_info as (select instance_id, launch_template_data ->> 'ImageId' as image_id from aws_ec2_instance) select i.instance_id, i.image_id, ami.name from instance_info as i left join aws_ec2_ami_shared as ami on i.image_id = ami.image_id

Error: please provide either owner_id or image_id (SQLSTATE HV000)

+-------------+----------+------+
| instance_id | image_id | name |
+-------------+----------+------+
+-------------+----------+------+

Time: 0.7s. Rows fetched: 6 (2 cached). Hydrate calls: 4.
> with instance_info as (select instance_id, launch_template_data ->> 'ImageId' as image_id from aws_ec2_instance order by instance_id) select i.instance_id, i.image_id, ami.name from instance_info as i left join aws_ec2_ami_shared as ami on i.image_id = ami.image_id
+---------------------+-----------------------+-------------------------------------------------------------------+
| instance_id         | image_id              | name                                                              |
+---------------------+-----------------------+-------------------------------------------------------------------+
| i-06e0de1b8436a1f50 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
| i-076c621c9eafe3bd8 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
| i-0bd0fba70e1c158c9 | ami-029c0fbe456d58bd1 | RHEL-7.7_HVM-20190923-x86_64-0-Hourly2-GP2                        |
| i-0e31039a58e5b9623 | ami-0b69ea66ff7391e80 | amzn2-ami-hvm-2.0.20190823.1-x86_64-gp2                           |
+---------------------+-----------------------+-------------------------------------------------------------------+

Time: 4.9s. Rows fetched: 8. Hydrate calls: 4.

It also works with a materialized view (without the order by):

> with instance_info as materialized (select instance_id, launch_template_data ->> 'ImageId' as image_id from aws_ec2_instance) select i.instance_id, i.image_id, ami.name from instance_info as i left join aws_ec2_ami_shared as ami on i.image_id = ami.image_id
+---------------------+-----------------------+-------------------------------------------------------------------+
| instance_id         | image_id              | name                                                              |
+---------------------+-----------------------+-------------------------------------------------------------------+
| i-0e31039a58e5b9623 | ami-0b69ea66ff7391e80 | amzn2-ami-hvm-2.0.20190823.1-x86_64-gp2                           |
| i-0bd0fba70e1c158c9 | ami-029c0fbe456d58bd1 | RHEL-7.7_HVM-20190923-x86_64-0-Hourly2-GP2                        |
| i-06e0de1b8436a1f50 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
| i-076c621c9eafe3bd8 | ami-07d0cf3af28718ef8 | ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190722.1 |
+---------------------+-----------------------+-------------------------------------------------------------------+

Time: 4.9s. Rows fetched: 8. Hydrate calls: 4.

cbruno10 avatar Sep 13 '23 17:09 cbruno10

elsewhere: other examples: https://github.com/turbot/steampipe/issues/968

judell avatar Sep 14 '23 15:09 judell