efcore Translate embedded IQueryables to common table expressions (CTEs)

When reusing a separate IQueryable in operations, e.g. .Where(obj => otherQuery.Contains(obj)), EF Core generates a subquery (as expected). However, in these cases, manually reviewing generated SQL in debug logs becomes more difficult to visually/mentally parse and damages the readability of the code, especially when a given IQueryable is injected multiple times and/or the generated SQL is especially large.

I would like to have the ability to request that applicable queries replace all injected subqueries of this nature with common table expressions named after the injected IQueryable's symbol (like otherQuery in the previous example). This would in turn make reading the debug logs of executed SQL statements more readable.

This could just be an opt-in optional behavior specifically for use in debug contexts. One could apply this strategy individually via something like AsNoTracking() or app-wide via the DbContextOptionsBuilder like with EnableDetailedErrors().

Related Issues: #26486 (CTEs via a special LINQ operator, including recursive)

Dec 22 '22 17:12 willnationsdev

@willnationsdev are you suggesting we generate a completely different SQL (with CTE instead of subqueries) specifically for logging SQL, and continue using the current subquery SQL for what's actually executed against the database?

Dec 22 '22 17:12 roji

@roji

Hmmm...as far as I know, CTEs have an equivalent performance impact compared to copy/pasting subqueries, yes? Regardless, differing logged vs. executed queries would defeat the purpose of "improving debug log analysis" in the first place, so I wouldn't advocate for that. What shows up in logs should always be what is actually sent to the database to keep logs accurate, so provided the implementation work involved is the same, I think it would be better to change the query string for both logging and execution.

If however including CTEs actually does have some detrimental impact on query handling for some reason, then perhaps this request should involve logging-specific query string generation on an as-requested, per-query basis. Then users can generate ad-hoc CTE-based queries when they need to review an especially complex query, but not employ the CTEs in the average use case.

Dec 22 '22 18:12 willnationsdev

So you're asking to change EF to replace all subqueries in its SQL with CTEs? That seems like quite a request, especially if the motivation here is finding CTEs a bit more readable.

In fact, since CTEs effectively extract out entire subtrees of the query tree, doing this systematically for all subqueries is very likely to degrade readability; especially since EF would have to mechanically generate the CTE identifiers, so they'd be meaningless (cte1, cte2...). Try figuring out which CTE means what in a complex query with many of these.

Don't get me wrong - I think CTEs are great, and there are some specific situations where they're very helpful (I'm especially thinking of recursive CTEs, which allow expressing queries which are otherwise inexpressible). But the readability of simple subqueries vs. CTE seems largely a matter of personal taste, and I don't see us changing everything here as you suggest.

(and all this before having thought or investigated any potential perf issues resulting from such a switch)

Dec 22 '22 19:12 roji

@roji That makes sense. I was really looking for a way to more clearly distinguish/segregate which parts of a large & complex query were derived from which IQueryable instances, and when I've moved around my own subqueries into CTEs while writing SQL manually, I found it helpful; however, if doing so in EF is a lot of work and/or if the generated SQL doesn't even assign aliases reminiscent of the originally provided queries, then it's definitely not worth the effort.

I just wasn't sure how hard it would be to do, and I knew that the few generated query strings I had looked at thus far included DECLARE @... statements that had names closely matching those of the symbols passed as parameters to the C# Expressions, so I figured the same could be done with the IQueryables via CTEs.

Dec 23 '22 15:12 willnationsdev

[...] the few generated query strings I had looked at thus far included DECLARE @... statements that had names closely matching those of the symbols passed as parameters to the C# Expressions [...]

Yeah, but parameters are very, very different from subqueries.

I think the way forward here is for us to implement explicit support for CTEs, where the user uses a LINQ operator to express SQL WITH (#26486). Once that happens, the query writer has the option to decide exactly what they want as a CTE and what name to give it. That would be generally better than just turning all subqueries into "anonymous" CTE blocks.

I'm going to go ahead and close this, but please feel free to continue posting here and discussing.

Dec 23 '22 19:12 roji

Can you explain how parameters are very very different to sub queries in terms of getting the name?

If I've understood correctly, this request is not about turning all subqueries into a CTE but specifically about turning those generated from IQueryable variables into CTEs. Wouldn't the IQueryable variable name be captured in the expressions and so could be used as the CTE name?

I feel there's quite good symmetry between, in the code/LINQ, defining an IQueryable separately to the main query and potentially using it multiple times in the main query, and, in the SQL, doing the same with a CTE.

Dec 23 '22 19:12 stevendarby

Can you explain how parameters are very very different to sub queries in terms of getting the name?

Oh, I wasn't referring specifically to the name - they're different in various other ways.

Looking again at the above, I think I missed the fact that the request was specifically about when an external IQueryable variable is integrated into the query, rather than about subqueries in general (sorry). For that case we can indeed know the name and give it to a CTE, similar to how we do that for parameters.

I do kinda like this idea - integrating an external IQueryable is a complex, "independent" query is being integrated into the main LINQ query, so it makes some sense to do the same on the SQL side via CTEs. But I'm still not sure doing this automatically is the right thing - at the very least a serious cross-database investigation would need to make sure the change to CTEs doesn't affect perf etc. And the gain is purely in more readable SQL.

I'll reopen so we can at least discuss this on the team.

Dec 23 '22 19:12 roji

[...] potentially using it multiple times in the main query [...]

This is a good point, and would be a possible perf improvement, as the subquery is only evaluated once rather than multiple times.

On the downside, any sort of dynamic query construction (such as the integration of an IQueryable) is basically incompatible with any sort of static analysis, and so won't work with precompiled queries and AOT. Of course, whether the user integrates IQueryable or not is orthogonal to what SQL we produce from it (it's dynamic and AOT-incompatible in any case) - but it might mean it's less interesting to invest efforts into this.

Dec 23 '22 19:12 roji

Until we would have full CTE support, would it be possible to have tagging support in embedded IQueryables? For logging purposes it would be nice if the existing TagWith would output the tag next to the subquery.

.Where(obj => otherQuery.TagWith("subquery for other stuff").Contains(obj))

Jan 10 '23 10:01 joakimriedel

@joakimriedel query tags are implemented at the entire query level, and cannot be attached to one part of the query or another; so unfortunately this isn't possible at this point.

Jan 10 '23 15:01 roji

@joakimriedel query tags are implemented at the entire query level, and cannot be attached to one part of the query or another; so unfortunately this isn't possible at this point.

I see, I thought it would be possible to do some rewriting magic since it would always be part of an expression tree. Alas, even stronger vote for CTE, I'm also heavily invested in sub queries like OP.

Jan 10 '23 16:01 joakimriedel

Strongly in favour of this feature, we have queries on the hot path that would benefit from the perf gain here

Dec 06 '24 13:12 Pepsi1x1

@Pepsi1x1 I may be wrong, but if you take a look at roji's comment...

On the downside, any sort of dynamic query construction (such as the integration of an IQueryable) is basically incompatible with any sort of static analysis, and so won't work with precompiled queries and AOT.

...he seems to insinuate that embedding an IQueryable within another IQueryable makes performance optimizations impossible, if that's what you were looking for. My proposal was more for the sake of improving the readability of generated SQL queries. Or did you mean the SQL-side optimization of just reducing the amount of SQL being parsed for repeatedly used sub-queries?

Dec 07 '24 00:12 willnationsdev

@willnationsdev my comment was only about the incompatibility of IQueryable composition (which is a form of dynamic querying) with NativeAOT/precompilation - that's not something that interests everyone. There's nothing wrong with using this in regular, non-NativeAOT programs.

(in fact, we may be able to support this very specific form of dynamic querying even in NativeAOT/precompilation - but that's not sure).

But in any case, readability and performance align here: when you remove duplicated subqueries by introducing a CTE, you both make the SQL more readable and remove additional work for the database.

Dec 07 '24 09:12 roji

efcore efcore copied to clipboard

Translate embedded IQueryables to common table expressions (CTEs)

efcore
efcore copied to clipboard