Concurrent insert can cause AsSplitQuery to fail to populate unrelated child collections
When materialising entities via a query with split querying enabled, it's possible to encounter missing child entities purely by concurrently inserting unrelated data into the same tables. This can happen even when there have been no changes to the affected parent/child rows and ordering of the returned rows is completely stable. EF runs the expected queries and it does appear to successfully retrieve the child entities from the database, but it discards them.
Minimal consistent repro that sneakily inserts an unrelated record in between executing the first and second split queries (.NET 8.0, EF Core v8.0.5, SQL Server):
Program.cs:
using Microsoft.EntityFrameworkCore;
namespace EFTest;
public class Widget
{
public Widget(int id, IEnumerable<Child> children)
{
Id = id;
Children = children.ToHashSet();
}
// EF calls this when materialising the Widget
private Widget(int id)
{
Id = id;
// Strategically add another Widget between loading Widget 1 and loading its Children
using var dbContext = new MyDbContext();
dbContext.Widgets.Add(new(id: 2, children: [new("New")]));
dbContext.SaveChanges();
}
public int Id { get; private init; }
public HashSet<Child> Children { get; private init; } = [];
public override string ToString() =>
$"Widget {Id} with children: [{string.Join(',', Children.Select(c => c.Name))}]";
}
public record Child(string Name)
{
public int WidgetId { get; private set; }
}
public class MyDbContext() : DbContext(new DbContextOptionsBuilder().UseSqlServer(ConnectionString).Options)
{
private const string ConnectionString = "<SQL Server connection string goes here>";
public DbSet<Widget> Widgets { get; private init; } = null!;
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Widget>(e =>
{
e.Property(widget => widget.Id).ValueGeneratedNever();
e.OwnsMany<Child>(widget => widget.Children);
});
}
}
public static class Program
{
public static async Task Main()
{
await using (var dbContext = new MyDbContext())
{
await dbContext.Database.EnsureCreatedAsync();
await dbContext.Widgets.ExecuteDeleteAsync();
dbContext.Widgets.Add(new(id: 1, children: [new("A"), new("B"), new("C")]));
await dbContext.SaveChangesAsync();
}
await using (var dbContext = new MyDbContext())
{
var widgets = await dbContext.Widgets.AsSplitQuery().OrderByDescending(w => w.Id).ToListAsync();
Console.WriteLine(string.Join('\n', widgets));
}
}
}
Add your connection string (MyDbContext.ConnectionString), generate an initial migration (dotnet ef migrations add Initial), then run the program.
Received output:
Widget 1 with children: []
Expected output:
Widget 1 with children: [A,B,C]
The repro no longer exhibits the issue if you remove the OrderByDescending(w => w.Id), but you can make the issue reappear by simply giving "Widget 2" an ID of 0. It seems that the moment the "parentless" (from the split query's perspective, anyhow) child row is encountered during enumeration, EF stops stitching together child entities with parent entities entirely. This means that depending on the order of the results you may experience child entities missing for all, or some, or no parent entities.
I acknowledge that the docs say that for split querying data consistency is not guaranteed without using serialisable/snapshotting transactions, but I found this scenario a little too unexpected and couldn't find any existing reports of this same scenario, so I thought I'd at least ask whether this is expected. The thing that makes this so unexpected is that the right data is being loaded by EF and it has enough information to stitch things together, but the presence of an additional row causes it not to do so. I'd expect these results if I was deleting "Widget 1" in between the first and second query such that EF sees the parent but no children, and I was fully willing to accept those consequences when not using serialisable/snapshotting transactions, but I didn't expect that I could make "Widget 1"s children disappear (for a single query) by merely inserting "Widget 2" at the wrong time even though "Widget 1" and all its related data was untouched the whole time.
Provider and version information
EF Core version: v8.0.5
Database provider: Microsoft.EntityFrameworkCore.SqlServer (I was able to reproduce the same behaviour with Npgsql.EntityFrameworkCore.PostgreSQL as well)
Target framework: .NET 8.0
Operating system: I've reproduced this behaviour on Windows, Linux & macOS
IDE: I've reproduced this behaviour in both Visual Studio and Rider
Sorry for taking so long to answer this.
As you noted, the moment you're doing split queries without serializable/snapshot isolation, you're exposed to all kinds of inconsistent data scenarios; this is one of the reasons split queries aren't the default loading mode. But I do agree that the behavior you're describing sounds odd and could indicate a bug in EF's materialization logic... I'll place this in the backlog for further investigation, but it's unlikely we'll get to this any time soon, as similar such inconsistencies may appear in any case (even if a bit less unexpected).
Hello!
Another reproduction, based on issue's author reproduction. Here we can see if parent's navigation property changed before loading its children then its children won't be loaded.
using Microsoft.EntityFrameworkCore;
namespace EFTest;
public class Widget
{
public Widget(int id, IEnumerable<Child> children)
{
Id = id;
Children = children.ToHashSet();
}
// EF calls this when materialising the Widget
private Widget(int id)
{
Id = id;
// Strategically update Widget 2 before loading it's children
if (id == 1)
{
using var dbContext = new MyDbContext();
var widget = new Widget(2, Enumerable.Empty<Child>());
var menu = dbContext.Menus.First();
dbContext.Set<Widget>().Attach(widget);
widget.Menu = menu;
dbContext.SaveChanges();
}
}
public int Id { get; private init; }
public HashSet<Child> Children { get; private init; } = new();
public Menu? Menu { get; set; }
public override string ToString() =>
$"Widget {Id} with children: [{string.Join(',', Children.Select(c => c.Name))}]";
}
public record Menu(int Id);
public record Child(int Id, string Name)
{
public int WidgetId { get; private set; }
}
public class MyDbContext : DbContext
{
private const string ConnectionString = "";
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseSqlServer(ConnectionString);
}
public DbSet<Widget> Widgets { get; private init; } = null!;
public DbSet<Menu> Menus { get; private init; } = null!;
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Widget>(e =>
{
e.Property(widget => widget.Id).ValueGeneratedNever();
e.HasMany<Child>(c => c.Children)
.WithOne();
e.HasOne<Menu>(widget => widget.Menu)
.WithMany();
});
modelBuilder.Entity<Menu>(e => { e.Property(c => c.Id).ValueGeneratedNever(); });
modelBuilder.Entity<Child>(e =>
{
e.Property(c => c.Id).ValueGeneratedNever();
e.Property(c => c.Name);
e.Property(c => c.WidgetId);
});
}
}
public static class Program
{
public static async Task Main()
{
await using (var dbContext = new MyDbContext())
{
await dbContext.Database.EnsureCreatedAsync();
await dbContext.Widgets.ExecuteDeleteAsync();
await dbContext.Menus.ExecuteDeleteAsync();
dbContext.Menus.Add(new Menu(1));
dbContext.Widgets.Add(new Widget(1, new[] {new Child(11, "A1"), new Child(12, "B1"), new Child(13, "C1")}));
dbContext.Widgets.Add(new Widget(2, new[] {new Child(21, "A2"), new Child(22, "B2"), new Child(23, "C3")}));
await dbContext.SaveChangesAsync();
}
await using (var dbContext = new MyDbContext())
{
var widgets = await dbContext.Widgets
.Include(w => w.Children)
.Include(w => w.Menu)
.AsSplitQuery()
.ToListAsync();
Console.WriteLine(string.Join(Environment.NewLine, widgets));
}
}
}
Received output:
Widget 1 with children: [A1,B1,C1]
Widget 2 with children: []
Expected output:
Widget 1 with children: [A1,B1,C1]
Widget 2 with children: [A2,B2,C3]
Adding transaction with isolation level = Snapshot or changing AsSplitQuery to AsSingleQuery helps to get expected output. However, when change transaction isolation level you should use workaround to avoid this issue https://github.com/dotnet/SqlClient/issues/96 :)
In EF Core 2.1, which used split queries by default, there was not such problem.
Provider and version information
EF Core version: v8.0.8 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 8.0 Operating system: Linux IDE: Rider
P.S. full workaround is listed below
await using (var dbContext = new MyDbContext())
{
// open connection manually that mean that it won't be return to connection pool unit we close it manually
await dbContext.Database.OpenConnectionAsync();
var tran1 = await dbContext.Database.BeginTransactionAsync(IsolationLevel.Snapshot);
var widgets = await dbContext.Widgets
.Include(w => w.Children)
.Include(w => w.Menu)
.AsSplitQuery()
.ToListAsync();
await tran1.CommitAsync();
// https://github.com/dotnet/SqlClient/issues/96 workaraound
var tran2 = await dbContext.Database.BeginTransactionAsync(IsolationLevel.Unspecified);
await tran2.CommitAsync();
await dbContext.Database.CloseConnectionAsync();
Console.WriteLine(string.Join(Environment.NewLine, widgets));
}
In EF Core 2.1, which used split queries by default, there was not such problem.
I'm not sure at this point exactly how things worked in EF Core 2.1, but what you're reporting really is the expected behavior when doing split query; without a higher isolation level (e.g. Snapshot), there are no consistency guarantees, so this sort of error can occur. As this is a concurrency issue and therefore highly sensitive to timing, my guess is that EF Core 2.1 had exactly the same behavior, but the issue simply didn't manifest.
Out of curiosity, can you please post the fully SQLs for the queries generated by both EF Core 2.1 and 8.0, so we can compare them?
Still can't understand why it is expected behaviour, because Menu field is not somehow related to loading children of Widget.
When I read this statement from documentation
While most databases guarantee data consistency for single queries, no such guarantees exist for multiple queries.
If the database is updated concurrently when executing your queries, resulting data may not be consistent. You can mitigate it by wrapping the queries in a serializable or snapshot transaction, although doing so may create performance issues of its own.
I expect that i will get Widget with empty Menu field and with its children. And this how it will work in terms of databases if I execute separately
select * from Widgetupdate Widget set MenuId = 1select * from Children where WidgetId in ...
Maybe I should understand this statement in terms of ef core? I mean that ef core can do some strange things when use splitQueries.
Returning to the queries
We can see, that when 2.1 loading child collections, it does not include Menu id in select output as 8.0 do
NET 8:
SELECT [w].[Id], [w].[MenuId], [m].[Id]
FROM [Widgets] AS [w]
LEFT JOIN [Menus] AS [m] ON [w].[MenuId] = [m].[Id]
ORDER BY [w].[Id], [m].[Id]
SELECT TOP(1) [m].[Id]
FROM [Menus] AS [m]
exec sp_executesql N'SET IMPLICIT_TRANSACTIONS OFF;
SET NOCOUNT ON;
UPDATE [Widgets] SET [MenuId] = @p0
OUTPUT 1
WHERE [Id] = @p1;
',N'@p1 int,@p0 int',@p1=2,@p0=1
SELECT [c].[Id], [c].[Name], [c].[WidgetId], [w].[Id], [m].[Id]
FROM [Widgets] AS [w]
LEFT JOIN [Menus] AS [m] ON [w].[MenuId] = [m].[Id]
INNER JOIN [Child] AS [c] ON [w].[Id] = [c].[WidgetId]
ORDER BY [w].[Id], [m].[Id]
NETCOREAPP2.1:
SELECT [w].[Id], [w].[MenuId], [w.Menu].[Id]
FROM [Widgets] AS [w]
LEFT JOIN [Menus] AS [w.Menu] ON [w].[MenuId] = [w.Menu].[Id]
ORDER BY [w].[Id]
SELECT TOP(1) [m].[Id]
FROM [Menus] AS [m]
BEGIN TRANSACTION
exec sp_executesql N'SET NOCOUNT ON;
UPDATE [Widgets] SET [MenuId] = @p0
WHERE [Id] = @p1;
SELECT @@ROWCOUNT;
',N'@p1 int,@p0 int',@p1=2,@p0=1
COMMIT TRANSACTION
SELECT [w.Children].[Id], [w.Children].[Name], [w.Children].[WidgetId]
FROM [Child] AS [w.Children]
INNER JOIN (
SELECT DISTINCT [w0].[Id]
FROM [Widgets] AS [w0]
LEFT JOIN [Menus] AS [w.Menu0] ON [w0].[MenuId] = [w.Menu0].[Id]
) AS [t] ON [w.Children].[WidgetId] = [t].[Id]
ORDER BY [t].[Id]
@adebelyi thanks for the added detail.
We can see, that when 2.1 loading child collections, it does not include Menu id in select output as 8.0 do
This is something that isn't clear to me (but as I said, I'm unfamiliar with how EF 2.1 split queries worked). I'm probably missing something, but the principal (Menu) key must be projected in the 2nd split query which returns the children, otherwise how is EF to know which child belongs to which principal?
I do agree that a more thorough investigation is needed here at some point, and that there may be a possible bug in EF (at the very least the odd behavior originally signaled above).
This is something that isn't clear to me (but as I said, I'm unfamiliar with how EF 2.1 split queries worked). I'm probably missing something, but the principal (Menu) key must be projected in the 2nd split query which returns the children, otherwise how is EF to know which child belongs to which principal?
I think that you misunderstood the reproduction a bit.
Tables:
- Widget (Id, MenuId)
- Menu (Id)
- Child (Id, Name, WidgetId)
Queries:
- select all Widgets
- update MenuId in single Widget
- select Widget children (which not somehow related to Menu)
To bind child to widget EF should use WidgetId field not MenuId.
In NET8 in child load query EF adds menu id in select but 2.0 not:
SELECT [c].[Id], [c].[Name], [c].[WidgetId], [w].[Id], ---->>[m].[Id] <---
And maybe EF use this menuId somehow wrong, because it loaded it before, when it loaded Widgets table.
I do agree that a more thorough investigation is needed here at some point, and that there may be a possible bug in EF (at the very least the odd behavior originally signaled above).
Okay, thanks.
I just wrote another repro for this after it caused us a production outage.
https://github.com/torutek/EfTest
It inserts a new row between the queries of a split query (this is hooked in the LogTo method in EfTestContext).
Data looks like Person *-* Team -* Vehicle.
Logs look like: EF Queries out Person EF Queries out Team ~ We race and insert a new team with a new vehicle ~ Ef Queries out Vehicle
I did a little bit of stepping through EF to see what is happening. https://github.com/dotnet/efcore/blob/main/src/EFCore.Relational/Query/RelationalShapedQueryCompilingExpressionVisitor.ShaperProcessingExpressionVisitor.ClientMethods.cs#L566
It looks like in here when it reads the Vehicle that is on a Team it doesn't know about it continues iterating through the teams query results looking for the matching team, but never finding a match. This causes it to discard all vehicles from that point onwards in the query.
This feels like something that should be able handled better by EF, this certainly isn't the behavior we expected based on the documentation. This makes SplitQuery very dangerous to use without using snapshots. You risk not receiving a bunch of unrelated data from a query because new data was inserted at the same time.
Is there any indication of when this issue occurs? I would love to be able to detect how often this happens so I can the transaction tradeoff
I have a similar problem where FirstOrDefaultAsync with split query return empty children While ToListAsync return correct result Same query and same include but different result I have been trying to reproduce in a minimal example but it is not happening there only in our project which is too complex to share Is there any suggestion or maybe someone had a similar problem
Has there been any further investigation of this issue? It doesn’t appear that this behavior is typical for any type of inconsistency one would expect to be intrinsic for this type of scenario (data changing in between split queries). Given that split queries generally perform better than single queries, and the workaround is to manually wrap every query in a transaction, which is impractical if one wants to enable globally, it feels like it could be worth understanding a little more clearly @roji
Please correct me if this is an inaccurate representation of the current situation.
@devony split queries are by their very nature sensitive to concurrency issues, and in the general case indeed must be wrapped by a transaction with an a high isolation level (e.g. serializable) in order to function correctly; that's just the nature of performing two separate SQL queries. There's a specific oddness signaled above which indeed we should look into, but otherwise everything here is the expected behavior and there's nothing we can actually do about it in EF - except maybe introduce an option to automatically wrap all split queries in a serializable transaction, or similar (which has its own non-trivial performance implications).
Re the specific oddness tracked by this issue, it's unlikely we'll get around to investigating it any time soon, since there are higher-priority work items at the moment, and in any case the general issue with split query and concurrency will continue to be there.
I've been spending some time on this issue and while split queries do have inherent problems, I don't think this particular issue is due to such a problem.
TLDR; the bug is in the decision of when to stop reading rows from a split query without ordering. There are both code and database query fixes.
@danzel has a great repro project that nicely shows the problem. In the repro, the query
var query = context.People
.Include(p => p.Photos)
.Include(p => p.Teams).ThenInclude(h => h.Vehicles)
.AsSplitQuery();
results in the following sql (on mssql)
first
SELECT [s].[PeoplePersonId], [s].[TeamsId], [s].[Id], [s].[Name], [s].[Updated], [p].[PersonId]
FROM [People] AS [p]
INNER JOIN (
SELECT [p1].[PeoplePersonId], [p1].[TeamsId], [t].[Id], [t].[Name], [t].[Updated]
FROM [PersonTeam] AS [p1]
INNER JOIN [Teams] AS [t] ON [p1].[TeamsId] = [t].[Id]
) AS [s] ON [p].[PersonId] = [s].[PeoplePersonId]
ORDER BY [p].[PersonId], [s].[PeoplePersonId], [s].[TeamsId], [s].[Id]
then
SELECT [p0].[PersonPhotoId], [p0].[PersonId], [p0].[Updated], [p0].[Url], [p].[PersonId]
FROM [People] AS [p]
INNER JOIN [PersonPhotos] AS [p0] ON [p].[PersonId] = [p0].[PersonId]
ORDER BY [p].[PersonId]"
"SELECT [p].[PersonId], [p].[Name], [p].[Updated]
FROM [People] AS [p]
ORDER BY [p].[PersonId]
**TIME MARKER**
finally
SELECT [v].[Id], [v].[Name], [v].[TeamId], [v].[Updated], [p].[PersonId], [s].[PeoplePersonId], [s].[TeamsId], [s].[Id]
FROM [People] AS [p]
INNER JOIN (
SELECT [p1].[PeoplePersonId], [p1].[TeamsId], [t].[Id]
FROM [PersonTeam] AS [p1]
INNER JOIN [Teams] AS [t] ON [p1].[TeamsId] = [t].[Id]
) AS [s] ON [p].[PersonId] = [s].[PeoplePersonId]
INNER JOIN [Vehicles] AS [v] ON [s].[Id] = [v].[TeamId]
ORDER BY [p].[PersonId], [s].[PeoplePersonId], [s].[TeamsId], [s].[Id]
Note that the problem occurs with the Vehicles collection. Note also that there is no ordering on any column of the Vehicles table. The result is that the order they will return is undefined (likely clustered index in mssql, which matters in a second). In the repro there is a Vehicle inserted that has an ID between two existing vehicles (the ID is the PK, so the new rows are between the existing ones in the natural ordering). This happens at the time marker above. The split query's load continues until the overall query is complete and there are 3 vehicles missing from the collection.
Let's dig into the code a bit
On line 562 in the method
RelationalShapedQueryCompilingExpressionVisitor.ShaperProcessingExpressionVisitor.PopulateSplitIncludeCollectionAsync
has
if (splitQueryCollectionContext.Parent is TIncludingEntity entity)
{
while (dataReaderContext.HasNext ?? await dbDataReader.ReadAsync(queryContext.CancellationToken).ConfigureAwait(false))
{
if (!CompareIdentifiers(
identifierValueComparers,
splitQueryCollectionContext.ParentIdentifier, childIdentifier(queryContext, dbDataReader)))
{
dataReaderContext.HasNext = true;
return;
}
This code gets hit when a split query result is being enumerated. The problem is that at the very first row in the split that doesn't match the parent (by way of the identifier) it assumes that the split query result is finished, and that it's time to move to the next split collection (or finish). This is generally fine, as most sets are ordered by their identifier, so it's valid to say that once you hit a non-matching value you're done.
However, there is no ordering on the vehicle table! So when that last query is run it will return in the natural row order, and breaks when it hits the rows added at the time marker. There are more rows to consume, but we've already returned.
This is ultimately why a transaction works. Another workaround is to use timestamps and don't select any new rows (using a QueryFilter). If the vehicle id was included in the outer ordering that would also work.
There is also a code fix for this problem. Change the code above to:
if (splitQueryCollectionContext.Parent is TIncludingEntity entity)
{
while (dataReaderContext.HasNext ?? await dbDataReader.ReadAsync(queryContext.CancellationToken).ConfigureAwait(false))
{
if (!CompareIdentifiers(
identifierValueComparers,
splitQueryCollectionContext.ParentIdentifier, childIdentifier(queryContext, dbDataReader))
&& !dbDataReader.HasRows
)
{
dataReaderContext.HasNext = true;
return;
}
and the problem goes away. Instead of bailing when we hit an ID that doesn't match, we also check if there are more rows. I don't think there are a ton of downsides to this fix, as the overall query should keep us from consuming too many rows that don't have parents.
I'd be happy to put up a PR with the code fix (assuming there aren't any unknown side effects). The ordering fix may also be tractable, but I haven't tried it.
anyone know if https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-10.0/whatsnew#more-consistent-ordering-for-split-queries would fix this issue?
@devony I suspect that a lot of people are indeed ending up here because of the problem that's fixed in EF 10 by #26808; but there seems to be something else above that may not be addressed by it, further investigation still needs to be done.