Remove ORDER BY applied to singleton queries
Fixes #34831.
I started working on this again after https://github.com/dotnet/efcore/pull/34097 was merged :) This is not yet ready for merging (missing new tests).
I added a test and based this on top of #35384, so that it actually performs the desired simplifications (as visible in Distinct_take_without_orderby).
@ranma42 thanks as always for the work and sorry it took so long to review. I'm coming out of a very intensive work period working on other stuff and will try to review PRs etc.
This change seems fine to me, though IIUC the kind of queries this optimize are "silly", if written by an end user; in other words, a LINQ query (EF or otherwise) containing Take(1).OrderBy(...) doesn't make any sense, as the OrderBy can always be removed (a .NET analyzer could in theory be written which does that, if we knew that the OrderBy had no side effects).
We generally avoid going too much into this kind of optimization, where EF tries to catch badly-written queries and improve their translations; it's better for users to write better LINQ queries. It's true that in some cases such optimizations can kick in as a result of processing we ourselves do in the query pipeline (and not because of "silly" input queries); that's indeed valuable, though is that the case here?
Bottom line: I'm totally fine merging this as the change adds minimal complexity, but I'm just asking in case I'm missing something (or you want to discuss). Let me know what you think.
@ranma42 any comment on the above?
oops, sorry for the delayed reply!
I agree that these queries are unlikely in a normal codebase: .Take(N).OrderBy() is unusual and even more so the special case N=1, which is obviously pointless, so it can only happen because of a mistake or multiple parts that compose a query without tracking its cardinality.
OTOH this is a simple check that brings the expressions a little more towards a "normalized" form. Conceptually, this is a generalization of #34482 (DISTINCT over singletons is a no-op, just like ORDER BY)
We generally avoid going too much into this kind of optimization, where EF tries to catch badly-written queries and improve their translations; it's better for users to write better LINQ queries. It's true that in some cases such optimizations can kick in as a result of processing we ourselves do in the query pipeline (and not because of "silly" input queries); that's indeed valuable, though is that the case here?
It was not designed for this, although given the code that ended up interacting with (see both the changes and the interaction with #34097) it might indeed trigger in some split queries 🤔 I will try and have a look (I am thinking about something like db.Blogs.Include(x => x.Posts.OrderBy(y => y.SortKey).Take(1)), which is definitely non-standard, but somewhat reasonable and expressed in the most natural form).
I will investigate the split query case and report back; in any case, either decision (merge or close) works for me 👍