efcore
efcore copied to clipboard
Linq expression containing Array Object isn't cached and compiled every time
There is an API Gateway app based on .NET 6 on Linux and there is a significant native memory increase when there are 1000 memory. In the endpoint, there is below Linq expression:
for (var i = 0; i < 1000; i++)
{
...
var exp = dbContext.TestModels
.Where(x => new [] { 1, 2, 3 }.Contains(x.Test))
.Expression;
var query = queryCompiler.ExtractParameters(exp, queryContext, queryCompilerLogger);
compiledQueryCache
.GetOrAddQuery(
cachedKeyGenerator.GenerateCacheKey(query, async: true),
() => queryCompiler.CompileQueryCore<SingleQueryingEnumerable<TestModel>>(database, query, dbContext.Model, true));
}
As tested as below, the Entity Framework Query Compiler seems does not support Array and the compilation of this query produces unique cache keys every time, so each API endpoint request create a new cached expression.
If run like above 1000 times, the memory cache size is 1001:
If comment out the where clause:
Could you check if the usage is correct or if it's a bug that the expression cannot be cached?
EF Core version: Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 Operating system: Linux
@dukezhao123 What are queryCompiler
and compiledQueryCache
and how are they obtained?
Good catch... Looks like ExpressionEqualityComparer needs to recognize and treat arrays in a special way... /cc @smitpatel
We may want to think about other data structures here as well (e.g. List/Dictionary)
Looks like ExpressionEqualityComparer needs to recognize and treat arrays in a special way..
Elaborate. There is no array in compare in expression tree. If you have run the repro, can you paste the expression tree (the debug view, not our expression printer result)
Sure.
For the following code (full sample below):
_ = ctx.Blogs.Where(x => new[] { 1, 2, 3 }.Contains(x.Id)).ToList();
_ = ctx.Blogs.Where(x => new[] { 1, 2, 3 }.Contains(x.Id)).ToList();
Breaking in CompiledQueryCache.GetOrAddQuery, the query in the cache key has the following debug view:
.Call System.Linq.Queryable.Where(
.Extension<Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression>,
'(.Lambda #Lambda1<System.Func`2[Blog,System.Boolean]>))
.Lambda #Lambda1<System.Func`2[Blog,System.Boolean]>(Blog $x) {
.Call System.Linq.Enumerable.Contains(
.Constant<System.Int32[]>(System.Int32[]),
$x.Id)
}
The int[] constant in there is the problem... When debug-stepping over the 2nd query invocation, _memoryCache.TryGetValue returns false instead of true. I'm pretty sure that's because in ExpressionEquailtyComparer, CompareConstant just does Equals over the two array constants, which returns false (hash code calculation should have the same problem).
Makes sense?
Full code sample
await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();
_ = ctx.Blogs.Where(x => new[] { 1, 2, 3 }.Contains(x.Id)).ToList();
_ = ctx.Blogs.Where(x => new[] { 1, 2, 3 }.Contains(x.Id)).ToList();
public class BlogContext : DbContext
{
public DbSet<Blog> Blogs { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
.LogTo(Console.WriteLine, LogLevel.Information)
.EnableSensitiveDataLogging();
}
public class Blog
{
public int Id { get; set; }
public string Name { get; set; }
}
Hi @smitpatel could I ask why the type-bug label was removed?
We shouldn't be doing deep equality comparison in ExpressionEqualityComparer at least for this scenario since it is the entry point for query cache. A deep comparison at this point for all constant can cause a severe perf penalty. We may need to think a different solution. You can extract out array into a parameter in local scope and use that inside query, EF Core will gladly reuse the same query cache entry.
Hi @smitpatel, could you please share more details on the workaround you shared? An example will be appreciated.
var array = new [] { 1, 2, 3 };
var exp = dbContext.TestModels
.Where(x => array.Contains(x.Test))
.Expression;
@smitpatel my thinking is this... If the current (cheap) constant comparison logic returns false, then adding additional logic to check for arrays should be OK perf-wise, since without it we go into full-blown query compilation anyway (cache miss), so the impact of the additional check shouldn't be meaningful. But let's discuss in design (have added needs-design).
That could be good compromise.
@ajcvickers May I know if there will be a fix for this Bug in EF Core 6? This is requested by our premier customer.
@neilzzy It's unlikely, since the workaround is trivial.
One additional thought on this... For the hash code calculation, we may want to exclude constant values altogether:
- This means that all trees with the same structure get stored under the same dictionary bucket, regardless of constant values; so if there are a lot of trees where the only differentiating thing is a constant value, this would be inefficient. However, that scenario pretty much only occurs in the bad dynamic query construction scenario, where constants are mistakenly used instead of a parameter; so I'm not sure this is a real problem.
- It would restrict hash code calculation only to structural Expression elements, and no longer involve potentially arbitrary hash code calculations; this could help performance a bit.
- We would still have to add specific logic in the equality check; but that's zero-impact as discussed before (only done if the basic check fails).