trino icon indicating copy to clipboard operation
trino copied to clipboard

[FR] Document rules/limitations of correlated subqueries in Trino

Open GavinRay97 opened this issue 3 years ago • 1 comments

I've noticed that instead of not permitting correlated subqueries, instead only certain kinds of correlated subqueries seem to be valid.

For instance, the below query throws an error: Given correlated subquery is not supported

👇 CLICK TO DISPLAY QUERY
with users (user_id, name) as (values (1, 'Alice'),
                                      (2, 'Bob'),
                                      (3, 'Charlie')),
     todo_lists (todo_list_id, user_id, todo_list_title) as (values (1, 1, 'todo list 1 (Alice)'),
                                                          (2, 1, 'todo list 2 (Alice)'),
                                                          (3, 2, 'todo list 3 (Bob)'),
                                                          (4, 3, 'todo list 4 (Charlie)')),
     todos (todo_id, todo_list_id, todo_description) as (values (1, 1, 'todo 1 (todo list 1)'),
                                                      (2, 1, 'todo 2 (todo list 1)'),
                                                      (3, 2, 'todo 3 (todo list 2)'),
                                                      (4, 3, 'todo 4 (todo list 3)'),
                                                      (5, 4, 'todo 5 (todo list 4)'),
                                                      (6, 4, 'todo 6 (todo list 4)'))

select cast(array_agg(user_json) as json) as user_maps
from (select array_agg(map_from_entries(ARRAY[
    ('user_id', cast(users.user_id as json)),
    ('name', cast(users.name as json)),
    ('todo_lists', (select cast(array_agg(map_from_entries(ARRAY[
        ('todo_list_id', cast(todo_lists.todo_list_id as json)),
        ('todo_list_title', cast(todo_lists.todo_list_title as json)),
        ('todos', (select cast(array_agg(map_from_entries(ARRAY[
            ('todo_id', cast(todos.todo_id as json)),
            ('todo_description', cast(todos.todo_description as json))
        ])) as json) from todos where todos.todo_list_id = todo_lists.todo_list_id))
    ])) as json) from todo_lists where todo_lists.user_id = users.user_id))
])) as user_json from users) as user_maps

image

But if we remove the 3rd nesting level (todos), it runs:

select cast(array_agg(user_json) as json) as user_maps
from (select array_agg(map_from_entries(ARRAY[
    ('user_id', cast(users.user_id as json)),
    ('name', cast(users.name as json)),
    ('todo_lists', (select cast(array_agg(map_from_entries(ARRAY[
        ('todo_list_id', cast(todo_lists.todo_list_id as json)),
        ('todo_list_title', cast(todo_lists.todo_list_title as json))

    ])) as json) from todo_lists where todo_lists.user_id = users.user_id))
])) as user_json from users) as user_maps

image

GavinRay97 avatar Sep 19 '22 20:09 GavinRay97

Here's a simpler reproduction with the same shape:

WITH
    t (a) AS (VALUES 1),
    u (b, a) AS (VALUES (1, 1)),
    v (b, c) AS (VALUES (1, 1))
SELECT max((
    SELECT max((
        SELECT max(c)
        FROM v
        WHERE v.b = u.b))
    FROM u
    WHERE u.a = t.a))
FROM t

And here's another variant with an even simpler inner query:

WITH
    t (a) AS (VALUES 1),
    u (a) AS (VALUES 1),
    v (c) AS (VALUES 1)
SELECT max((
    SELECT max((SELECT c FROM v))
    FROM u
    WHERE u.a = t.a))
FROM t

martint avatar Sep 20 '22 05:09 martint

Just to update this in case anyone else on the internet finds it:

There is a way to emulate the behavior of correlated subqueries. Credit goes to my coworker for explaining this. You can use JOIN's with GROUP BY on join keys to do this:

image

GavinRay97 avatar Oct 04 '22 15:10 GavinRay97

I'm running into this limitation as well (hi @GavinRay97 👋). The simplified example can be further simplified by removing MAX() (which doesn't seem necessary):

WITH
  t (a) AS (VALUES 1),
  u (b, a) AS (VALUES (1, 1)),
  v (b, c) AS (VALUES (1, 1))
SELECT (
  SELECT (
    SELECT c
    FROM v
    WHERE v.b = u.b
  )
  FROM u
  WHERE u.a = t.a
)
FROM t;

It appears that LATERAL doesn't suffer from the same problem:

WITH
  t (a) AS (VALUES 1),
  u (b, a) AS (VALUES (1, 1)),
  v (b, c) AS (VALUES (1, 1))
SELECT c
FROM t, LATERAL (
  SELECT c
  FROM u, LATERAL (
    SELECT c
    FROM v
    WHERE v.b = u.b
  ) x
  WHERE u.a = t.a
) x;

Will investigate this further. For simple purposes, scalar LATERAL derived tables are equivalent to ordinary scalar subqueries, logically, at least as long as there's no GROUP BY clause.

lukaseder avatar Mar 14 '23 13:03 lukaseder

Unfortunately, this approach doesn't work in case the correlation uses <= instead of =, and if there's ORDER BY .. LIMIT, as in this question here: https://stackoverflow.com/q/69804139/521799

lukaseder avatar Mar 14 '23 14:03 lukaseder