efcore
efcore copied to clipboard
`GroupBy` over complex key generates redundant subquery
Summary
In case if anything different from property or navigation access is performed in GroupBy
method, generated query will add additional subquery. Probably to avoid repeating SQL expression used in group by
.
Example
Having the following context:
public class Entity
{
public int Id { get; private set; }
}
public class AppContext: DbContext
{
public DbSet<Entity> Entities { get; private set; }
//...
}
and perfoming query:
await context.Entities
.GroupBy(e => e.Id.ToString())
.Select(g => g.Key)
.ToListAsync()
version 7.0.0-preview.5.22302.2
(and later, up to 7.0.2
as of the time of writing) generates this SQL:
SELECT [t].[Key]
FROM (
SELECT CONVERT(varchar(11), [e].[Id]) AS [Key]
FROM [Entities] AS [e]
) AS [t]
GROUP BY [t].[Key]
while version 7.0.0-preview.4.22229.2
(and before) does querying via:
SELECT CONVERT(varchar(11), [e].[Id])
FROM [Entities] AS [e]
GROUP BY CONVERT(varchar(11), [e].[Id])
Changing GroupBy
expression to e => e.Id
(or navigation property access ) gets rid of subquery in all cases, yet using something as simple as e => 1
still generates one.
MRE is attached: GroupBySubquery.zip. Just replace the connection string with the one you use.
Сonclusion
First option seems to generally have worse execution plan, especially as main query grows more complex.
I'm not sure if this was intentional change or introduced bug, but in any case user needs to have an option to revert to old behaviour, as this can affect query performance quite noticeably.
In similar looking issues I found a suggested workaround of adding
AppContext.SetSwitch("Microsoft.EntityFrameworkCore.Issue27102", true);
but this doesn't seem to affect this case.
Environment
EF Core version: 7.0.2
, 7.0.0-preview.5.22302.2
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 7.0
Operating system: Windows
/cc @maumar @roji
I guess this is related to https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-7.0/whatsnew#groupby-entity-type. This is a major performance decrease in our app too. What took 1 or 2 seconds on EF Core 6 now times out after >1min in production. If there is no fix for this it is probably required for us to downgrade our production app back to ef core 6.
We do something similar to (this is pseudo code, I simplified a more complicated query):
var qAgg = from x in db.B
group x by new { x.Process.Element.Customer.Kind }
into g
select new
{
Kind = g.Key.Kind,
Region1Rule1 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule1),
Region2Rule1 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule1),
Region3Rule1 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule1),
Region4Rule1 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule1),
Region1Rule2 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule2),
Region2Rule2 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule2),
Region3Rule2 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule2),
Region4Rule2 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule2),
};
On EF Core 6 this used to be translated to something like (we're using postgres)
SELECT
e.kind AS "Kind",
COUNT(CASE WHEN b.region1 AND (z1.rule = 1) THEN 1 END)::INT AS "Region1Rule1",
COUNT(CASE WHEN b.region2 AND (z1.rule = 1) THEN 1 END)::INT AS "Region2Rule1",
COUNT(CASE WHEN b.region3 AND (z1.rule = 1) THEN 1 END)::INT AS "Region3Rule1",
COUNT(CASE WHEN b.region4 AND (z1.rule = 1) THEN 1 END)::INT AS "Region4Rule1",
COUNT(CASE WHEN b.region1 AND (z1.rule = 2) THEN 1 END)::INT AS "Region1Rule2",
COUNT(CASE WHEN b.region2 AND (z1.rule = 2) THEN 1 END)::INT AS "Region2Rule2",
COUNT(CASE WHEN b.region3 AND (z1.rule = 2) THEN 1 END)::INT AS "Region3Rule2",
COUNT(CASE WHEN b.region4 AND (z1.rule = 2) THEN 1 END)::INT AS "Region4Rule2"
FROM b
INNER JOIN z /* ...*/
INNER JOIN e /* ...*/
INNER JOIN z1 /* ...*/
GROUP BY e.kind
On EF Core 7 every single COUNT is translated to a subquery, resulting in a huge and repetitive query that takes forever to complete.
If there is any more info required I'm happy to work on this further. Providing my production example is not feasible because the queries are huge and customer-specific. I hope that my simplified examples together with @alexb5dh's explanation and example code provide enough info to further investigate.
After some additional search through EF issues, this one seems to be tightly related: https://github.com/dotnet/efcore/issues/29593.
Workaround from there also seems to work for now - the main trick is to add Distinct
to "cheat" new EF behaviour.