efcore icon indicating copy to clipboard operation
efcore copied to clipboard

EF8 - Regression on Contains query when column collation is not default DB collation

Open clement911 opened this issue 2 years ago • 39 comments

There seems to be a regression bug related to how EF8 now translates Contains with OPENJSON instead of using IN in previous versions. This applies to SQL server and SQL database and it happens when the string column uses a collation that is not the default DB collation.

I could reproduce this issue on Azure SQL Database.

I could also reproduce on a local SQL server instance, but only if the DB containment type is set to Partial.

image

If I set the containment type to 'None', then the query runs fine.

using EFRepro;
using Microsoft.EntityFrameworkCore;

var ids = new[] { "a", "b", "c" };

try
{
    using (var db = new BloggingContext())
    {
        db.Blogs.Add(new Blog { Id = Guid.NewGuid().ToString() });
        db.SaveChanges();
        var query = db.Blogs.Where(b => ids.Contains(b.Id));
        string sql = query.ToQueryString();
        var blogs = query.ToList();
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.ToString());
}


namespace EFRepro
{
    public class BloggingContext : DbContext
    {
        public DbSet<Blog> Blogs { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseSqlServer(@"Server=(local)\sql2022;Database=Blogs;Trusted_Connection=True;TrustServerCertificate=true");
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Blog>()
                .Property(i => i.Id)
                .UseCollation("Latin1_General_100_BIN2_UTF8");
        }
    }

    public class Blog
    {
        public string Id { get; set; }
    }
}

Microsoft.Data.SqlClient.SqlException (0x80131904): Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.<>c.<MoveNext>b__21_0(DbContext _, Enumerator enumerator)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Program.<Main>$(String[] args) in C:\Dev\EFRepro\EFRepro\Program.cs:line 14
ClientConnectionId:5b0ee62c-b3a5-4aa9-b1fd-3f3a904e5689
Error Number:468,State:9,Class:16

Generated sql

DECLARE @__ids_0 nvarchar(4000) = N'["a","b","c"]';

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)
Msg 468, Level 16, State 9, Line 3
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.

In order to fix it, EF would have add a COLLATE clause with the name of the collation that is configured in the model. Either on the Id or Value column.

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] COLLATE Latin1_General_100_BIN2_UTF8 IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)


SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] IN (
    SELECT [i].[value] COLLATE Latin1_General_100_BIN2_UTF8
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)

EF Core version: 8.0.0-rc.2.23480.1 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 8.0 Operating system: Windows 11 IDE: VS 2022 17.8 Preview 4

clement911 avatar Oct 24 '23 02:10 clement911

Thanks @clement911, I'll look into this.

roji avatar Oct 24 '23 09:10 roji

Will the be fixed for ef8 rtm?

clement911 avatar Nov 08 '23 12:11 clement911

@clement911 RTM bits are baked long time ago. Fixes will come in Feb 2024 or later

ErikEJ avatar Nov 08 '23 12:11 ErikEJ

@ErikEJ I'm not sure whether the 8.0.1 will come in Feb or later at this point, there may be an earlier release.

roji avatar Nov 08 '23 14:11 roji

Hmm. That is a big regression to let slip to ef8. That means we won't be able to upgrade to Net 8 for ages. Aren't you concerned people will upgrade only to encounter this regression bug?

clement911 avatar Nov 08 '23 18:11 clement911

@clement911 The team has no control over this. We can only release when management says we can.

ajcvickers avatar Nov 08 '23 18:11 ajcvickers

I understand but you control what's in the release, no? I'm just surprised that such a regression is allowed to be included.

clement911 avatar Nov 08 '23 18:11 clement911

@clement911 We have to hand off the bits for the release around a month before the release goes out. Again, we have no control over this.

ajcvickers avatar Nov 08 '23 18:11 ajcvickers

Btw, this is why we direct people to use the daily builds. It's the only way to get the latest bug fixes.

ajcvickers avatar Nov 08 '23 18:11 ajcvickers

OK. Hopefully a fix can be released not too far away.

It might be a good idea to add more unit tests that use various char collations for sql server.

clement911 avatar Nov 08 '23 18:11 clement911

@clement911 Thanks for the tip. We are always adding to our tests, but the scope is large. We do run over 120,000 tests, the majority of which are functional tests that use the database, on every build.

ajcvickers avatar Nov 08 '23 18:11 ajcvickers

Would a workaround here be to set the sql compatibility level lower so that it reverts to the old Contains translation? https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-8.0/whatsnew#queries-with-primitive-collections

stevendarby avatar Nov 08 '23 23:11 stevendarby

Thanks @stevendarby , that seems like a good idea!

At least we might be able to upgrade to .NET 8 without having to wait for the proper fix.

Is the full impact/behavior of UseCompatibilityLevel documented anywhere?

clement911 avatar Nov 09 '23 01:11 clement911

A targeted workaround here would be to tell EF to add a COLLATE clause to the column passed to IN, using EF.Functions.Collate():

 var query = db.Blogs
    .Where(b => ids.Contains(EF.Functions.Collate(b.Id, "Latin1_General_100_BIN2_UTF8")))
    .ToList();

This results in the following SQL, which disambiguates the collation:

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] COLLATE Latin1_General_100_BIN2_UTF8 IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)

roji avatar Nov 11 '23 17:11 roji

The proper fix here would be for us to infer the collation from [b].[Id] and apply that to @__ids_0 inside the OPENJSON call; this would follow what we already do in order to infer the type mapping itself - this is how we know to put nvarchar(450) inside the WITH clause above. However, the inference logic is currently coded to only discover and track the type mapping, and not the collation; making this more general - capable of inferring and applying the collation as well - is non-trivial and probably wouldn't qualify as a patch for 8.0.

Note to self: unlike with the type mapping, we'd want to infer a collation only when one has been explicitly set on the column.

roji avatar Nov 11 '23 17:11 roji

Removing servicing-consider as we won't be patching a fix for now - a fix is likely to be large/risky, and there's a pretty good workaround with EF.Functions.Collate. We can revisit this if more user feedback is provided that this is a problem.

roji avatar Nov 13 '23 14:11 roji

Thanks for the workaround with EF.Functions.Collate @roji This definitely makes this bug less severe for us.

clement911 avatar Nov 13 '23 20:11 clement911

Great, that's good to hear. I'll work on properly fixing this for 9.0, and if we see lots of users run into it and the fix is low-risk, we can consider patching 8.0 at that point.

roji avatar Nov 13 '23 20:11 roji

After updating EF Core, we got the same problem: [2023-11-16 15:46:36] [S0001][102] Line 7: Incorrect syntax near '$'. Code:

    public async Task<IReadOnlyCollection<TypeWork>> GetAsync(int[] ids, CancellationToken cancellationToken)
    {
        await using var context = await _contextFactory.CreateDbContextAsync(cancellationToken);
        return await context.TypeWorks
            .Where(twe => ids.Contains(twe.Id))
            .Project()
            .ToArrayAsync(cancellationToken);
    }
    file static class TypeWorkServiceExtensions
{
    public static IQueryable<TypeWork> Project(this IQueryable<TypeWorkEntity> source)
    {
        return source
            .Select(twe => new TypeWork
            {
                Id = twe.Id,
                Name = twe.Name,
                SmalList = twe.SmalList,
                OrderNum = twe.OrderNum,
                InProject = twe.InProject,
                ForNonbasicDepartment = twe.ForNonbasicDepartment
            });
    }
}

Sql-script generate:

DECLARE @__ids_0 nvarchar(4000) = N'[3,7,14]';

SELECT [t].[ID_TypeWork] AS [Id], [t].[Name], [t].[SmalList], [t].[OrderNum], [t].[InProject], [t].[ForNonbasicDepartment]
FROM [Time].[TypeWork] AS [t]
WHERE [t].[ID_TypeWork] IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i]
)

We are waiting for a patch for this :)

masterworgen avatar Nov 16 '23 11:11 masterworgen

@masterworgen Did you try the workaround? https://github.com/dotnet/efcore/issues/32147#issuecomment-1806868857

ErikEJ avatar Nov 16 '23 11:11 ErikEJ

@ErikEJ I just lowered the compatibility

            options.UseSqlServer(configuration.GetConnectionString("MyDBConnect"),
                sqlServerOptionsBuilder => sqlServerOptionsBuilder.UseCompatibilityLevel(120)));

I tried this solution now, but still have the same error

Sql-script generate:

DECLARE @__ids_0 nvarchar(4000) = N'[3,7,14]';

SELECT [t].[ID_TypeWork] AS [Id], [t].[Name], [t].[SmalList], [t].[OrderNum], [t].[InProject], [t].[ForNonbasicDepartment]
FROM [Time].[TypeWork] AS [t]
WHERE [t].[ID_TypeWork] COLLATE Latin1_General_100_BIN2_UTF8 IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i]
)

masterworgen avatar Nov 16 '23 11:11 masterworgen

@masterworgen what compatibility level is your database??

ErikEJ avatar Nov 16 '23 11:11 ErikEJ

@masterworgen if openjson is used you are not setting a lower compat level properly

ErikEJ avatar Nov 16 '23 11:11 ErikEJ

@masterworgen what compatibility level is your database??

100

@masterworgen if openjson is used you are not setting a lower compat level properly

Yes, indeed, we use an old version of MSSQL, which is why there is no Openjson support

masterworgen avatar Nov 16 '23 11:11 masterworgen

@masterworgen Then you must use:

options.UseSqlServer(configuration.GetConnectionString("MyDBConnect"),
                sqlServerOptionsBuilder => sqlServerOptionsBuilder.UseCompatibilityLevel(120)));

ErikEJ avatar Nov 16 '23 11:11 ErikEJ

@masterworgen your problem isn't related to this issue, which is about a problem with non-default collations - please try to avoid hijacking existing issues, and open new ones instaed.

As @ErikEJ wrote, if you're using an old MSSQL, you must tell EF that via UseCompatibilityLevel. All this is explained in the breaking change note.

roji avatar Nov 16 '23 12:11 roji

We are also affected.

We have a non-default collation on a particular string column (called "ExternalId", it is important) that needs to be case-sensitive, as opposed to the default case-insensitive collation.

The work-around that @roji mentioned worked for us (thank you!), but we are looking forward to the fix.

Is it known why the problem only shows up in Azure SQL Database? There are no problems on our on-premise test Sql Servers.

dettner avatar Nov 23 '23 08:11 dettner

I'm not sure why it happens only on contained databases. My hunch is that it has something to do with how the collation is inherited from the server level collation, but I'm not really clear on that.

I found that there is a big problem with the work around suggested by @roji . Applying the EF.Functions.Collate function will result in the query not being able to leverage any index on the column. That is, it will prevent SARGability and result in an index scan instead of an index seek. So there is a massive performance downside.

For that reason, maybe using UseCompatibilityLevel is more appropriate. However, is there any documentation explaining exactly what the impact of using it is?

clement911 avatar Nov 23 '23 21:11 clement911

Applying the EF.Functions.Collate function will result in the query not being able to leverage any index on the column. That is, it will prevent SARGability and result in an index scan instead of an index seek. So there is a massive performance downside.

Yeah. This goes against the goal of the change that caused this issue actually and makes the fix a lot less suitable.

Meligy avatar Nov 24 '23 01:11 Meligy

Considering the potential impact this could have on those upgrading to EF Core 8, I would like to kindly suggest prioritizing a fix to ensure a smooth transition for users. Additionally, highlighting this in the Release Notes or Known Issues would greatly benefit the community by providing visibility and aiding in mitigation planning. Thank you for your dedication to enhancing EF Core.

dettner avatar Nov 24 '23 08:11 dettner