Preql icon indicating copy to clipboard operation
Preql copied to clipboard

Incorrect SQL code generated

Open SimonSntPeter opened this issue 2 years ago • 6 comments

Came to this from the HN discusson 'against SQL'. In https://github.com/erezsh/Preql

The translation of this...

print Continent { ... // Include existing fields density: population / area // Create new a field

} order{^density}

...to this...

WITH subq_1(id, name, area, population, density) AS ( SELECT id, name, area, population, (CAST(population AS float) / area) AS density FROM Continent ORDER BY density DESC) SELECT * FROM subq_1

...is almost certainly wrong. Certainly it is in mssql and likely every other DB. The order by is not honoured except in the last (outermost) select statement.

If tried in MSSQL you correctly get this error:

Msg 1033, Level 15, State 1, Line 6 The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

Any 'order by' in a subquery, including a CTE, does not guarantee any ordering at all.

It needs to be

WITH subq_1(id, name, area, population, density) AS ( SELECT id, name, area, population, (CAST(population AS float) / area) AS density FROM Continent) SELECT * FROM subq_1 ORDER BY density DESC

(which mssql accepts without complaint)

HTH

SimonSntPeter avatar Jul 13 '21 11:07 SimonSntPeter

Thanks for bringing this to my attention.

Preql currently doesn't officially support MSSQL, but only Postgresql, MySql, and Sqlite.

I've just tested this code on Sqlite and Postgres and both work and return the correct results.

I must say MSSQL's behavior seems a bit perplexing to me. I see no reason that order by won't work in CTEs. But I might consider adding support for MSSQL next.

erezsh avatar Jul 13 '21 12:07 erezsh

No problem. Just checked docs:

Bigquery https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#order_by_clause doesn't seem to specify in those circumstances, redshift https://docs.aws.amazon.com/redshift/latest/dg/r_ORDER_BY_clause.html likewise doesn't seem to. That's not impressive of either of them. Perhaps you might contact them for clarification.

Anyway, I'd advise caution relying on it.

SimonSntPeter avatar Jul 13 '21 12:07 SimonSntPeter

Thanks! I'll take it under advisement.

I'm still confused about the intention behind it. After all, order is important in SQL. It affects UNION ALL, OFFSET-LIMIT, window functions, and tons of other stuff. And there is no way to predict how they will be composed. So ignoring a meaningful sorting.. is a questionable policy.

Anyway, I believe it won't be too hard to fix if necessary.

erezsh avatar Jul 13 '21 12:07 erezsh

Hi, please, speaking as a DBA guy with ~25 years experience, you CANNOT assume ordering will do what you want here, and I emphasise experimentation won't help, in fact it'll make things worse because you get the results you expect... until you don't. Which will inevitably be in production (and it's horrible when that happens). See https://dba.stackexchange.com/questions/184149/is-it-really-possible-that-the-order-will-not-be-guaranteed-for-this-particular

The reason MSSQL correctly forbids it, and the AQL spec says don't rely on any ordering unless orer by is given is to allow for optimisations. These are the same optimisations which may very well alter your execution plan as the load on the DB changes (or may alter it via predicate pushdown, that might be possible).

See https://www.mssqltips.com/sqlservertip/4472/sql-server-enterprise-advanced-scan-aka-merrygoround-scan/ for an example of how one query can affect the order of the results of another "When Query 1 and Query 2 reach page 200,000, Query 1 will complete, but Query 2 will wrap back to the first data page and continue to scan until it reaches page 100,000 and then completes" So query 2 gets stuff out of order.

It's your choice but TL;DR I politely urge you not to rely on any assumption of ordering, nor to rely on ordering in subqueries.

SimonSntPeter avatar Jul 13 '21 12:07 SimonSntPeter

The less you're allowed to assume, the more leeway the optimiser has.

Cheers, and all the best!

SimonSntPeter avatar Jul 13 '21 12:07 SimonSntPeter

Thank you, I promise I will look into it.

Having a reliable implementation is a high priority for me.

erezsh avatar Jul 13 '21 12:07 erezsh