efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Batch split queries

Open roji opened this issue 7 years ago • 7 comments

In some scenarios, EF Core internally generates more than one read in order to execute a single query. This is notably the case when executing (some?) joins when using a non-MARS provider. Performance could be (greatly!) improved by batching those multiple reads into a single batch internally. Unless I'm mistaken, in the case you generate multiple queries, the results for the earlier queries have to be buffered anyway, so there's no additional penalty in the buffering implied in the batching.

Note: this issue isn't about exposing a batching read API to the user (that's what #10879 is about).

roji avatar Feb 05 '18 21:02 roji

Note: see also #10465, which is very similar but comes from a slightly different angle.

ajcvickers avatar Feb 08 '18 01:02 ajcvickers

See also https://github.com/aspnet/EntityFrameworkCore/issues/5952

divega avatar Feb 08 '18 07:02 divega

Unless I'm mistaken, this is no longer relevant as we generate one SQL query per LINQ query since 3.0. Closing.

roji avatar Sep 25 '19 17:09 roji

Reopening and assigning to @smitpatel at his request, could be done as part of the workaround for #18022.

roji avatar Apr 21 '20 10:04 roji

I've checked what command batching does to isolation/consistency in SQL Server and PostgreSQL, and the results are negative: merely batching statements into the same DbCommand doesn't have any impact on isolation. In other words, different database state can be observed by different statements in the same DbCommand, if not within an explicit transaction.

It's still possible to have consistency by wrapping the batch in a serializable transaction (or snapshot in SQL Server) - see test code. We probably shouldn't do this implicitly, since it implies error or deadlocking behavior that doesn't occur otherwise.

Test code
[NonParallelizable]
public class BatchTransactionIsolationTest
{
    const string PostgresConnectionString = ...;
    const string SqlServerConnectionString = ...;

    [Test]
    public async Task SqlServer([Values(true, false)] bool wrapInSerializableTx)
    {
        using var conn1 = new SqlConnection(SqlServerConnectionString);
        using var conn2 = new SqlConnection(SqlServerConnectionString);
        conn1.Open();
        conn2.Open();

        var createSql = @"
IF OBJECT_ID('dbo.data', 'U') IS NOT NULL 
DROP TABLE data; 
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
        
        using (var createCmd = new SqlCommand(createSql, conn1))
            createCmd.ExecuteNonQuery();

        var tx = wrapInSerializableTx
            ? conn1.BeginTransaction(IsolationLevel.Snapshot)
            : null;

        using var batchCommand = new SqlCommand(@"
SELECT * FROM data;
WAITFOR DELAY '00:00:05';
SELECT * FROM data", conn1, tx);
        
        using var modifyCommand = new SqlCommand(@"
UPDATE data SET id=2 WHERE id=1; 
INSERT INTO data (id) VALUES (10)", conn2);

        var t = batchCommand.ExecuteReaderAsync();
        Thread.Sleep(1000);
        modifyCommand.ExecuteNonQuery();

        var reader = await t;
        Console.WriteLine("Before wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));

        Assert.True(reader.NextResult());
        
        Console.WriteLine("After wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));
    }
    
    [Test]
    public async Task Postgres([Values(true, false)]bool wrapInSerializableTx)
    {
        using var conn1 = new NpgsqlConnection(PostgresConnectionString);
        using var conn2 = new NpgsqlConnection(PostgresConnectionString);
        conn1.Open();
        conn2.Open();

        var createSql = @"
DROP TABLE IF EXISTS data;
CREATE TABLE data (id INT);
INSERT INTO data (id) VALUES (1)";
        
        using (var createCmd = new NpgsqlCommand(createSql, conn1))
            createCmd.ExecuteNonQuery();

        if (wrapInSerializableTx)
            conn1.BeginTransaction(IsolationLevel.Serializable);
        
        using var batchCommand = new NpgsqlCommand(@"
SELECT * FROM data;
SELECT pg_sleep(5);
SELECT * FROM data", conn1);
        
        using var modifyCommand = new NpgsqlCommand(@"
UPDATE data SET id=2 WHERE id=1; 
INSERT INTO data (id) VALUES (10)", conn2);

        var t = batchCommand.ExecuteReaderAsync();
        Thread.Sleep(1000);
        modifyCommand.ExecuteNonQuery();

        var reader = await t;
        Console.WriteLine("Before wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));

        Assert.True(reader.NextResult());
        Assert.True(reader.NextResult());  // pg_sleep returns a void resultset
        
        Console.WriteLine("After wait:");
        while (reader.Read())
            Console.WriteLine(reader.GetInt32(0));
    }
}

roji avatar May 01 '20 20:05 roji

@roji Thanks for checking this. It's unfortunate, but I think it re-enforces that we need to retain both collection-include ehaviors for now.

ajcvickers avatar May 02 '20 15:05 ajcvickers

I completely agree. I also think it points towards retaining single-query as the default behavior, but I think we all agree we need some way to opt into split query.

roji avatar May 02 '20 15:05 roji

Note: batching becomes impossible if we decide to go in the direction of https://github.com/dotnet/efcore/issues/12776 (fetching dependents by foreign keys instead of reevaluating the principal query).

roji avatar Mar 04 '24 10:03 roji