After Join generated extra ORDER BY section with phantom field
What happened?
In non generic targets the extra ORDER BY section is added in the end of the output. In my input I don't specify sorting and I don't need it.
And there is a tracks.name field from the tracks table in this section, which is not presented in FROM section.
It produces the error:
Reproduced in the website playground.
PRQL input
prql target:sql.postgres
from tracks
group media_type_id(
sort name
take 1
)
join media_types (== media_type_id)
select {
tracks.track_id,
media_types.name
}
SQL output
WITH table_0 AS (
SELECT
DISTINCT ON (media_type_id) track_id,
media_type_id,
name
FROM
tracks
ORDER BY
media_type_id,
name
)
SELECT
table_0.track_id,
media_types.name
FROM
table_0
JOIN media_types ON table_0.media_type_id = media_types.media_type_id
ORDER BY
table_0.media_type_id,
tracks.name
Expected SQL output
WITH table_0 AS (
SELECT
DISTINCT ON (media_type_id) track_id,
media_type_id,
name
FROM
tracks
ORDER BY
media_type_id,
name
)
SELECT
table_0.track_id,
media_types.name
FROM
table_0
JOIN media_types ON table_0.media_type_id = media_types.media_type_id
MVCE confirmation
- [X] Minimal example
- [X] New issue
Anything else?
No response
Yes, it seems to retain the sorting even though it's not needed in the DISTINCT ON case... Thanks for the report.
I did some playing around with this and it seems to be caused by the combination of:
- postgres or duckdb dialects
take 1which specialises toDISTINCT ONfor those dialects.
If you remove the dialect or change it to take 2 then it switches to the ROW_NUMBER() window function based approach which is more general and doesn't have that ORDER BY issue.
Is there any benefit to keep that DISTINCT ON specialisation or could we just use the more general algorithm?
Btw, in the general case, the intermediate variable _expr_0 leaks into the result set, e.g.
prql target:sql.postgres
from tracks
group media_type_id(
sort name
take 2
)
join media_types (== media_type_id)
Extra columns can often just be ignored but it might cause an issue for some people.
I use a ClickHouse dialect in my job.
In a clickhouse db I have a table where each row is a new state of some object. And a use case actually is to receive the last state.
So I group by object_id, order by update_date_time desc and take 1.
This method gives me the last row in each group.
I cannot use take 2 in this example.
Extra columns can often just be ignored but it might cause an issue for some people.
For instance, if I later UNION tables I get 'mismatch column error' as one of a tabe has extra hidden columns.