django
django copied to clipboard
Refs #33768 -- Removed combined query ordering by column index.
@apollo13 here's a PR demonstrating that it's possible to reference the left outer select clause columns by aliases instead of column index.
Something else that caught my eye is the branch that deals with reference to a non-selected alias
https://github.com/django/django/blob/6f80050496780e2f8a65e2a4374dd6375c9fbfa5/django/db/models/sql/compiler.py#L440-L452
I'm of opinion that we should drop support for that as the implicit column addition can chance the semantic of the query (e.g. DISTINCT
, GROUP BY
).
Thoughts?
I am all for removing code that changes semantics; and my initial code simply raised an error:
https://github.com/django/django/commit/84c1826ded17b2d74f66717fb745fc36e37949fd#diff-f58de2deaccecd2d53199c5ca29e3e1050ec2adb80fb057cdfc0b4e6accdf14fR326
That said I am not sure we can remove it easily now after it was added via https://github.com/django/django/commit/2cbd3967e0a51eab993df89679046d25ec78baec
buildbot, test on oracle.
@felixxm not sure if it's known but Jenkins Oracle nodes seem to be broken
@felixxm not sure if it's known but Jenkins Oracle nodes seem to be broken
Sorry my bad, should work now.
buildbot, test on oracle.
Good old Oracle with a surprising failure. From all the tests that do complex union and ordering the one that fails seem pretty trivial
https://github.com/django/django/blob/6f80050496780e2f8a65e2a4374dd6375c9fbfa5/tests/queries/test_qs_combinators.py#L418-L422
I don't have an Oracle setup available but this is what the query on PostgreSQL looks like
(SELECT "queries_number"."id",
"queries_number"."num",
"queries_number"."other_num",
"queries_number"."another_num"
FROM "queries_number"
ORDER BY "queries_number"."num" ASC
LIMIT 2)
UNION
(SELECT "queries_number"."id",
"queries_number"."num",
"queries_number"."other_num",
"queries_number"."another_num"
FROM "queries_number"
ORDER BY "queries_number"."num" DESC
LIMIT 2)
ORDER BY "num" DESC
LIMIT 4
I assume this is a weird edge case due to the usage of OFFSET
and FETCH
?
I assume this is a weird edge case due to the usage of
OFFSET
andFETCH
?
I'm not sure, I need to think about it :thinking:. Here is the query on Oracle:
(SELECT
"QUERIES_NUMBER"."ID",
"QUERIES_NUMBER"."NUM",
"QUERIES_NUMBER"."OTHER_NUM",
"QUERIES_NUMBER"."ANOTHER_NUM"
FROM "QUERIES_NUMBER"
ORDER BY "QUERIES_NUMBER"."NUM" ASC
FETCH FIRST 2 ROWS ONLY)
UNION
(SELECT
"QUERIES_NUMBER"."ID",
"QUERIES_NUMBER"."NUM",
"QUERIES_NUMBER"."OTHER_NUM",
"QUERIES_NUMBER"."ANOTHER_NUM"
FROM "QUERIES_NUMBER"
ORDER BY "QUERIES_NUMBER"."NUM" DESC
FETCH FIRST 2 ROWS ONLY)
ORDER BY "NUM" DESC FETCH FIRST 4 ROWS ONLY
It's surprising that, for example, this works:
(SELECT
"QUERIES_NUMBER"."NUM",
(-1 * -1) AS "__ORDERBYCOL2"
FROM "QUERIES_NUMBER"
WHERE "QUERIES_NUMBER"."NUM" >= -1)
UNION
(SELECT
"QUERIES_NUMBER"."NUM",
(6 * 6) AS "__ORDERBYCOL2"
FROM "QUERIES_NUMBER"
WHERE "QUERIES_NUMBER"."NUM" <= 2)
ORDER BY (2) ASC, "NUM" ASC
I found https://stackoverflow.com/questions/25387951/curious-issue-with-oracle-union-and-order-by which seems so obscure but might factor into what we are seeing.
On Fri, Jul 8, 2022, at 06:49, Mariusz Felisiak wrote:
I assume this is a weird edge case due to the usage of
OFFSET
andFETCH
?I'm not sure, I need to think about it 🤔. Here is the query on Oracle:
(SELECT "QUERIES_NUMBER"."ID", "QUERIES_NUMBER"."NUM", "QUERIES_NUMBER"."OTHER_NUM", "QUERIES_NUMBER"."ANOTHER_NUM" FROM "QUERIES_NUMBER" ORDER BY "QUERIES_NUMBER"."NUM" ASC FETCH FIRST 2 ROWS ONLY) UNION (SELECT "QUERIES_NUMBER"."ID", "QUERIES_NUMBER"."NUM", "QUERIES_NUMBER"."OTHER_NUM", "QUERIES_NUMBER"."ANOTHER_NUM" FROM "QUERIES_NUMBER" ORDER BY "QUERIES_NUMBER"."NUM" DESC FETCH FIRST 2 ROWS ONLY) ORDER BY "NUM" DESC FETCH FIRST 4 ROWS ONLY It's surprising that, for example, this works:
(SELECT "QUERIES_NUMBER"."NUM", (-1 * -1) AS "__ORDERBYCOL2" FROM "QUERIES_NUMBER" WHERE "QUERIES_NUMBER"."NUM" >= -1) UNION (SELECT "QUERIES_NUMBER"."NUM", (6 * 6) AS "__ORDERBYCOL2" FROM "QUERIES_NUMBER" WHERE "QUERIES_NUMBER"."NUM" <= 2) ORDER BY (2) ASC, "NUM" ASC — Reply to this email directly, view it on GitHub https://github.com/django/django/pull/15831#issuecomment-1178547084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAT5C2RUOOBD4CVSA5KVC3VS6XMTANCNFSM525PN2UQ. You are receiving this because you were mentioned.Message ID: @.***>
Thanks to both of you! So it seems that we could work around this obscure issue on Oracle by performing a subquery wrapping of the query combination
SELECT * FROM (
(SELECT
"QUERIES_NUMBER"."ID",
"QUERIES_NUMBER"."NUM",
"QUERIES_NUMBER"."OTHER_NUM",
"QUERIES_NUMBER"."ANOTHER_NUM"
FROM "QUERIES_NUMBER"
ORDER BY "QUERIES_NUMBER"."NUM" ASC
FETCH FIRST 2 ROWS ONLY)
UNION
(SELECT
"QUERIES_NUMBER"."ID",
"QUERIES_NUMBER"."NUM",
"QUERIES_NUMBER"."OTHER_NUM",
"QUERIES_NUMBER"."ANOTHER_NUM"
FROM "QUERIES_NUMBER"
ORDER BY "QUERIES_NUMBER"."NUM" DESC
FETCH FIRST 2 ROWS ONLY))
ORDER BY "NUM" DESC FETCH FIRST 4 ROWS ONLY
behind a feature flag (e.g. requires_compound_order_by_subquery
) which is admittedly a workaround but before proceeding we should agree on whether this more of a workaround than the order by column index through raw sql.
Well I am biased, because whatever lets us pull in psycopg3 gets a plus from me ;)
EDIT:// Joking aside, I prefer every workaround that does not involve raw sql. We push into subqueries in other situations as well, so it is not really new I'd say.
behind a feature flag (e.g.
requires_compound_order_by_subquery
) which is admittedly a workaround but before proceeding we should agree on whether this more of a workaround than the order by column index through raw sql.
I think it's fine to use subquery :+1:
FWIW this is still on my radar, I've just been kept away from my keyboard for a while :) I should be able to commit some time in early August.
buildbot, test on oracle.
Looks like it solved the Oracle issue!
That looks great; I'll rebase my postgresql changes and see what it results in.
Hi @charettes; I am playing more with my psycopg3 branch and I am running into issues with test_lookups_autofield_array
where we generate queries like this:
SELECT "postgres_tests_nullableintegerarraymodel"."field"[1]
FROM "postgres_tests_nullableintegerarraymodel"
WHERE ("postgres_tests_nullableintegerarraymodel"."field"[1])::integer IS NOT NULL
GROUP BY "postgres_tests_nullableintegerarraymodel"."field"[1], ARRAY[51, 52, 0] HAVING ARRAY_AGG("postgres_tests_nullableintegerarraymodel"."id" ) <@ (ARRAY[51, 52, 0])::integer[]
ORDER BY "postgres_tests_nullableintegerarraymodel"."field"[1] ASC
would it be possible to alias "postgres_tests_nullableintegerarraymodel"."field"[1]
here as well and reuse it in group by and order by? Because currently postgres complains about:
django.db.utils.ProgrammingError: column "postgres_tests_nullableintegerarraymodel.field" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "postgres_tests_nullableintegerarraymodel"."field"[$1...
this is due to the fact that [1]
is actually [$1]
and as such a bind param.
@apollo13 yes, it should do doable for both. Some work that landed in #15922 might help for that. We basically have to force column aliasing in this case and ensure both GROUP BY
and ORDER BY
refer to it.
Can I bribe you into doing that? I feel like this is out of my depth would would be very helpful to get psycopg3 support running. Or maybe just give me an overview of how I should tackle it and I might be able to try it on my own.
I can give it a shot in the next few days @apollo13, I'll report back here.
@apollo13 regarding the test_lookups_autofield_array
failure, have you tried preventing parametrization of index in IndexTransform.as_sql
?
index 3edc72ac94..c31279ad4b 100644
--- a/django/contrib/postgres/fields/array.py
+++ b/django/contrib/postgres/fields/array.py
@@ -316,7 +316,7 @@ def __init__(self, index, base_field, *args, **kwargs):
def as_sql(self, compiler, connection):
lhs, params = compiler.compile(self.lhs)
- return "%s[%%s]" % lhs, params + [self.index]
+ return "%s[%d]" % (lhs, self.index), params
@property
def output_field(self):
It seems the test is passing in https://github.com/django/django/pull/15687 though which makes me wonder if you found another solution instead?
@charettes Sorry, I was away sailing. The tests probably pass since the tests still test against python 2. In this specific case I might indeed be able to interpolate the integer locally (after ensuring it is an int to prevent injections). But the "general" problem still exists.
The tests probably pass since the tests still test against python 2.
I assume you meant psycopg2 here :)
But the "general" problem still exists.
Unfortunately I don't see an easy way of solving the "general" problem without potentially expensive equality checks that try to alias expressions with a different identity but that should be considered equal at the SQL level. Let me explain.
Looking at the test in more details https://github.com/django/django/blob/897f38fabea5e1b196f11250ff6dadfffa489840/tests/postgres_tests/test_array.py#L384-L391
The queryset needs to be translates to something along the lines of
qs = (
NullableIntegerArrayModel.objects.filter(
field__0__isnull=False,
)
.alias(group0=F("field__0"))
.values("group0")
.annotate(
arrayagg=ArrayAgg("id"),
)
.order_by("group0")
)
The naive approach would be to adjust the logic in sql.Query.set_group_by
which is triggered from QuerySet.annotate
when an aggregate expression is provided to always create an alias for selected expressions and use this alias in the GROUP BY
but that doesn't solve the issue for the ORDER BY
clause.
In order to make it work for the ORDER BY
clause we need to add both logic at order_by
call time to scan a possibly existing GROUP BY
to replace order_by("field__0")
by Ref("group0")
but we also need to adjust set_group_by
to scan the order by to replace possibly pre-existing order_by
that needs to be repointed as well.
e.g.
qs = (
NullableIntegerArrayModel.objects.filter(
field__0__isnull=False,
)
.order_by("field__0") # group0 doesn't exist yet.
.alias(group0=F("field__0"))
.values("group0")
.annotate(
arrayagg=ArrayAgg("id"),
)
)
Are you sure it's required for the ORDER BY
clause as well or is it only an issue for the SELECT
and GROUP BY
mismatch?
I assume you meant psycopg2 here :)
Oh yes :D
Unfortunately I don't see an easy way
I never said it were easy ;)
Are you sure it's required for the ORDER BY clause as well or is it only an issue for the SELECT and GROUP BY mismatch?
Mhm good question. I think the problem occured always together but the actual issue is probably the select & group by mismatch.
Mhm good question. I think the problem occured always together but the actual issue is probably the select & group by mismatch.
Ok I'll give a shot later this week then.
I just tried this out with:
import psycopg
with psycopg.connect() as conn:
r = conn.execute('SELECT "f1"[%s] FROM test', [1])
r = conn.execute('SELECT "f1"[%s] FROM test ORDER BY "f1"[%s]', [1, 1])
r = conn.execute('SELECT "f1"[%s] FROM test GROUP BY "f1"[%s]', [1, 1])
Only the last of those queries fails with:
Traceback (most recent call last):
File "/home/florian/sources/django.git/test.py", line 6, in <module>
r = conn.execute('SELECT "f1"[%s] FROM test GROUP BY "f1"[%s]', [1, 1])
File "/home/florian/.local/share/virtualenvs/django/lib/python3.10/site-packages/psycopg/connection.py", line 712, in execute
raise ex.with_traceback(None)
psycopg.errors.GroupingError: column "test.f1" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "f1"[$1] FROM test GROUP BY "f1"[$2]
^
so I think we are safe in assuming that only SELECT
needs to match GROUP BY
. ORDER BY
can target anything.
The last query can be fixed by using:
r = conn.execute('SELECT "f1"[%(test)s] FROM test GROUP BY "f1"[%(test)s]', {"test": 1})
instead. But I do not think this is something we can do easily.
Hi @charettes, I do have setup github actions to run against psycopg3 in my repository: https://github.com/apollo13/django/actions/runs/3104842905/jobs/5029775731
There you an see all the current failures. They are basically all because SELECT
doesn't match GROUP BY
.
One stands out though:
======================================================================
ERROR: test_orders_nulls_first_on_filtered_subquery (ordering.tests.OrderingTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/django/django/django/db/backends/utils.py", line 89, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.10/site-packages/psycopg/cursor.py", line 725, in execute
raise ex.with_traceback(None)
psycopg.errors.InvalidColumnReference: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...") AS "last_date" FROM "ordering_author" ORDER BY (SELECT MA... ^
so we might have the same issue with ORDER BY
in very specific edge cases.
Note that my PR is rebased on your changes from here (I just rebased them again to make sure that I have all changes, let's see what the next actions run says).
@apollo13 I'm not sure how we should proceed, I wanted to avoid tainting this PR with unrelated changes, but this commit should address the SELECT
/GROUP BY
mismatch issue you are experiencing
https://github.com/django/django/commit/d827b717be06d3533fd2b363ff8fa445c4cd6c78
I'll try to look at the DISTINCT/ORDER BY issue in the next few days.
Actually both issues should be solved by #16084, further research will have to wait for next week.
Moved extra work from #16084 back here.