Duplicated records returned in a projection
I'm having duplicated records returned by this query.
Everything is available in this test repository I've created to reproduce the issue, but I reproduce here the code and explanation for clarity.
Here is an example Model
internal class Transaction
{
public int Id { get; set; }
public string Name { get; set; }
public List<Statement> Statements { get; set; }
}
internal class Statement
{
public int Id { get; set; }
public int TransactionId { get; set; }
public int ContactId { get; set; }
public string Type { get; set; } //Client, Model, Agency
public string Name { get; set; }
public decimal Amount { get; set; }
public bool IsCommission { get; set; }
public Transaction Tramsaction { get; set; }
public Contact Contact { get; set; }
}
internal class Contact
{
public int Id { get; set; }
public string Name { get; set; }
public List<Statement> Statements { get; set; }
}
here id the DbContext with some preloade data
internal partial class MyDbContext : DbContext
{
public DbSet<Transaction> Transactions { get; set; }
public DbSet<Statement> Statements { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
//Uncomment and add your connection string to use SQL Server intead of InMemoryDatabase
//optionsBuilder.UseSqlServer("YourQueryStringHere")
optionsBuilder.UseInMemoryDatabase("QueryBug")
.EnableSensitiveDataLogging()
.LogTo(Console.WriteLine);
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Transaction>().HasData(new Transaction { Id = 1, Name = "Transaction 1" });
modelBuilder.Entity<Transaction>().HasData(new Transaction { Id = 2, Name = "Transaction 2" });
modelBuilder.Entity<Transaction>().HasData(new Transaction { Id = 3, Name = "Transaction 3" });
modelBuilder.Entity<Transaction>().HasData(new Transaction { Id = 4, Name = "Transaction 4" });
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 1, Name = "Contact 1" }); //Client
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 2, Name = "Contact 2" }); //Client
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 3, Name = "Contact 3" }); //Model
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 4, Name = "Contact 4" }); //Model
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 5, Name = "Contact 5" }); //Agency
modelBuilder.Entity<Contact>().HasData(new Contact { Id = 6, Name = "Contact 6" }); //Agency
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 1, TransactionId = 1, Type = "Client", Name = "Statement 1", ContactId = 1, Amount=1000, IsCommission=false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 2, TransactionId = 1, Type = "Client", Name = "Statement 1b", ContactId = 1, Amount = 200, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 3, TransactionId = 1, Type = "Model", Name = "Statement 2", ContactId = 3, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 4, TransactionId = 1, Type = "Model", Name = "Statement 2b", ContactId = 3, Amount = -600, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 5, TransactionId = 1, Type = "Agency", Name = "Statement 3", ContactId = 5, Amount=100, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 6, TransactionId = 1, Type = "Agency", Name = "Statement 3", ContactId = 6, Amount=50, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 7, TransactionId = 2, Type = "Client", Name = "Statement 1", ContactId = 1, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 8, TransactionId = 2, Type = "Client", Name = "Statement 1b", ContactId = 1, Amount = 200, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 9, TransactionId = 2, Type = "Model", Name = "Statement 2", ContactId = 4, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 10, TransactionId = 2, Type = "Model", Name = "Statement 2b", ContactId = 4, Amount = -600, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 11, TransactionId = 2, Type = "Agency", Name = "Statement 3", ContactId = 5, Amount = 100, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 12, TransactionId = 3, Type = "Client", Name = "Statement 1", ContactId = 2, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 13, TransactionId = 3, Type = "Client", Name = "Statement 1b", ContactId = 2, Amount = 200, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 14, TransactionId = 3, Type = "Model", Name = "Statement 2", ContactId = 3, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 15, TransactionId = 3, Type = "Model", Name = "Statement 2b", ContactId = 3, Amount = -600, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 16, TransactionId = 3, Type = "Agency", Name = "Statement 3", ContactId = 6, Amount = 100, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 17, TransactionId = 4, Type = "Client", Name = "Statement 1", ContactId = 2, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 18, TransactionId = 4, Type = "Client", Name = "Statement 1b", ContactId = 2, Amount = 200, IsCommission = true });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 19, TransactionId = 4, Type = "Model", Name = "Statement 2", ContactId = 4, Amount = 1000, IsCommission = false });
modelBuilder.Entity<Statement>().HasData(new Statement { Id = 20, TransactionId = 4, Type = "Model", Name = "Statement 2b", ContactId = 4, Amount = -600, IsCommission = true });
}
}
here is the query
using Microsoft.EntityFrameworkCore;
using EFCoreQueryBug;
using MyDbContext db = new();
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
var results = await db.Statements
.Where(s => s.Type == "Client")
.GroupBy(s => s.ContactId)
.Select(s => new
{
ContactId = s.Key,
ContactName = s.First().Contact.Name,
ClientStatements = s.Select(st => new
{
st.ContactId,
st.Type,
st.Name,
st.Amount,
st.IsCommission
}).ToList(),
ModelStatements = s.SelectMany(st => st.Tramsaction.Statements)
.Where(st => st.Type == "Model" && st.IsCommission)
.Select(st => new
{
st.ContactId,
st.Type,
st.Name,
st.Amount,
st.IsCommission
}).ToList(),
AgencyStatements = s.SelectMany(st => st.Tramsaction.Statements)
.Where(st => st.Type == "Agency")
.Select(st => new
{
st.ContactId,
st.Type,
st.Name,
st.Amount,
st.IsCommission
}).ToList(),
}).ToListAsync();
int contact1ModelStatements = results
.Where(x => x.ContactId == 1)
.First()
.ModelStatements.Count();
Console.WriteLine($"Expected result: 2 - Actual result: {contact1ModelStatements}");
I'm using an InMemory database provider but I've got the same result with SQLServer provider.
The data model represent a Transaction that can have Multiple related Statements each one connected to a Contact.
The Statements can be of different types "Client", "Model", "Agency" and each transaction can have multiple Statements for each type.
I'm not using TPH but just filtering on type manually.
The query is trying to get all the Statements of Type "Client" grouped by ContactID and then, for each of them, all the Statements of Type "Model" that are commission and are linked to the same Transaction.
With the preloaded data, the expected result for the ContactID = 1, should be 2 Model Statements, instead it returns 4 duplicating each row. The same can happen also with the Agency Statements.
The generated query actually returns duplicated rows but I was expecting It to be deduplicated by the object mapping.
Is this something I'm doing wrong or it's a bug in the way EF Core generates the query or reconstruct the object after the query?
EF Core version EFCore 8 / EFCore 7: Database provider: SqlServer / InMemory Target framework: NET 8 / NET 7 Operating system: Windows IDE: Visual Studio 2022 17.8.7
Is this something I'm doing wrong or it's a bug in the way EF Core generates the query or reconstruct the object after the query?
I think that the problem is in the logic. Let me try to help you :D
- Transaction could have multiple Statements (1..n).
- If you
GroupBybyContactIdthen you will have multipleStatementsin the projection. - Each
Statementhave relation to oneTransactionwhich could have multipleStatementsin effect if you will try to do something like that:ModelStatements = s.SelectMany(st => st.Tramsaction.Statements)insideSelectallst.Transactionwill have it's own Statements so you could point twice to the same Transaction.
Sample fix:
...
ModelStatements = s.SelectMany(st => st.Transaction.Statements)
.Where(st => st.Type == "Model" && st.IsCommission)
.GroupBy(st => st.Id, (id, statements) => statements.First())
.ToList(),
Sample fix:
... ModelStatements = s.SelectMany(st => st.Transaction.Statements) .Where(st => st.Type == "Model" && st.IsCommission) .GroupBy(st => st.Id, (id, statements) => statements.First()) .ToList(),
Actually I've already fixed it by adding a Distinct to the ModelStatement SelectMany, that is working. But I'm still surprised that this is happening. I'm filtering on the model statements that are commission and those are 2 in the DB but I get them duplicated in the results.
This is by design. We don't deduplicate results using change tracker, you need to do that yourself explicitly by adding Distinct into the query. All change tracker does is recognize that the same entity is projected twice and uses the same object instance, rather than creating two separate ones, but the number of results always stays the same. Also, identity resolution only works on entities, since the results are projected as anonymous types identity resolution doesn't come into play at all.