Sql Server - LINQ Contains returns false positive results for string values larger than column size
We've upgraded to EF Core 8.0.0 (from 7.0.8) and have started experiencing an issue with Contains in LINQ queries. This seems to be related to the breaking change described here. Currently it looks like the query ends up casting string values within OPENJSON to nvarchar(COLUMN-SIZE) which can result in false positive results for values larger than the column size.
For example, suppose we have a table the following record where Code is an nvarchar(2) column:
| Id (int) | Code (nvarchar(2)) |
|---|---|
| 1 | IN |
class SampleItem
{
int Id { get; set; }
string Code { get; set; }
}
var valuesToCheck = new [] { "INVALID" };
var items = dbContext.Set<SampleItem>()
.Where(i => valuesToCheck.Contains(i.Code))
.ToList();
The query generates the following SQL:
exec sp_executesql N'SELECT [s].[Id], [s].[Code]
FROM [SampleItems] AS [s]
WHERE [s].[Code] IN (
SELECT [c].[value]
FROM OPENJSON(@__codes_0) WITH ([value] nvarchar(2) ''$'') AS [c]
)',N'@__codes_0 nvarchar(4000)',@__codes_0=N'["INVALID"]'
Since [value] is truncated to 2 characters, INVALID is becomes IN which is in the table, resulting in a false positive result.
EF Core version: 8.0.0 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 8.0 Operating system: Windows 11 Pro 22H2 IDE: Visual Studio 2022 17.8.3
Confirmed the behavior change: the type mapping of the Contains item (nvarchar(2)) is inferred and applied to elements coming out of the parameter collection.
The same thing occurs for a simple comparison (Where(i => i.Code == singleCodeParam)); this is important in order to make sure the parameter's store type is set correctly; for example, for a DateTime compared to a SQL Server datetime column, we must infer to send the parameter as datetime and not the default datetime2. However, the SQL Server's provider parameter logic does not set the length on parameters, but rather sets a length based on the actual value only. On the other hand, with OPENJSON the type is set based on the inference (nvarchar(2)), truncating the parameter string values. A parameter sending logic taking into account inferred length (e.g. in other database providers) would show the same behavior for simple comparison.
We should probably not be applying the inferred length on the OPENJSON column definition, but that raises the question of which length we should set... Unlike with a regular parameter's SqlDbType, the determination of the column store type is made during translation, where we cannot inspect actual strings in the parameterized list.
A similar issue may also exist with precision/scale.
The only valid type for strings if we have no additional data would be "nvarchar(max)". It's worse for precession and scale. There is no combination of precision and scale that can cover all .NET decimal values (we looked into this extensively). This is why we warn if you don't tell us--we need to know!
Some investigation:
The underlying problem is not dissimilar to the situation with any parameter that doesn't "fit" in the column it is being used against. For example, consider the following, with again a max length of 2 on the column:
var code = "12";
db.SampleItems.Where(x => code == x.Code).ToListAsync();
We generate this:
info: 1/8/2024 18:37:32.540 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
Executed DbCommand (4ms) [Parameters=[@__code_0='12' (Size = 2)], CommandType='Text', CommandTimeout='30']
SELECT [s].[Id], [s].[Code], [s].[Value]
FROM [SampleItems] AS [s]
WHERE @__code_0 = [s].[Code]
However, if I write this instead:
var code = "INVALID";
db.SampleItems.Where(x => code == x.Code).ToListAsync();
Then I get:
info: 1/8/2024 18:38:45.672 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
Executed DbCommand (4ms) [Parameters=[@__code_0='INVALID' (Size = 4000)], CommandType='Text', CommandTimeout='30']
SELECT [s].[Id], [s].[Code], [s].[Value]
FROM [SampleItems] AS [s]
WHERE @__code_0 = [s].[Code]
This works only because we manipulated the parameter size when we found the given value doesn't fit.
The same things happens for decimals. Consider a column decimal(10, 0). When the parameter is 11, then the parameter is [Parameters=[@__value_0='11' (Precision = 10)], but if the parameter value is 10.2, then the parameter generated is [Parameters=[@__value_0='10.2' (Precision = 10) (Scale = 1)].
@roji Using nvarchar(max) is significantly slower. (Number 100, 1000, etc. is the number of items in the parameter array. Always a million rows in the database.)
| Method | Mean | Error | StdDev |
|---|---|---|---|
| MaxQuery100 | 493.0 us | 4.82 us | 4.51 us |
| MaxQuery1000 | 3,477.7 us | 38.43 us | 35.94 us |
| MaxQuery10000 | 33,975.4 us | 676.54 us | 1,073.06 us |
| MaxQuery100000 | 498,473.9 us | 9,672.50 us | 16,424.64 us |
| MaxQuery1000000 | 4,194,399.0 us | 27,227.54 us | 25,468.66 us |
| NonMaxQuery100 | 392.4 us | 4.08 us | 3.82 us |
| NonMaxQuery1000 | 2,478.6 us | 26.17 us | 24.48 us |
| NonMaxQuery10000 | 22,539.4 us | 447.72 us | 497.64 us |
| NonMaxQuery100000 | 333,372.9 us | 6,267.83 us | 5,862.93 us |
| NonMaxQuery1000000 | 3,121,266.2 us | 12,923.79 us | 11,456.61 us |
BenchmarkRunner.Run<Tests>();
public class Tests
{
private static readonly Dictionary<int, string> ParameterValues = new()
{
{ 100, CreateParameterValue(100) },
{ 1000, CreateParameterValue(1000) },
{ 10000, CreateParameterValue(10000) },
{ 100000, CreateParameterValue(100000) },
{ 1000000, CreateParameterValue(1000000) },
};
[Benchmark]
public void MaxQuery100()
=> Query("nvarchar(max)", ParameterValues[100]);
[Benchmark]
public void MaxQuery1000()
=> Query("nvarchar(max)", ParameterValues[1000]);
[Benchmark]
public void MaxQuery10000()
=> Query("nvarchar(max)", ParameterValues[10000]);
[Benchmark]
public void MaxQuery100000()
=> Query("nvarchar(max)", ParameterValues[100000]);
[Benchmark]
public void MaxQuery1000000()
=> Query("nvarchar(max)", ParameterValues[1000000]);
[Benchmark]
public void NonMaxQuery100()
=> Query("nvarchar(10)", ParameterValues[100]);
[Benchmark]
public void NonMaxQuery1000()
=> Query("nvarchar(10)", ParameterValues[1000]);
[Benchmark]
public void NonMaxQuery10000()
=> Query("nvarchar(10)", ParameterValues[10000]);
[Benchmark]
public void NonMaxQuery100000()
=> Query("nvarchar(10)", ParameterValues[100000]);
[Benchmark]
public void NonMaxQuery1000000()
=> Query("nvarchar(10)", ParameterValues[1000000]);
public void Query(string type, string parameterValue)
{
using var connection = new SqlConnection(@"Data Source=localhost;Database=MaxQuery;Integrated Security=True;TrustServerCertificate=true");
var command = CreateCommand(connection, type, parameterValue);
connection.Open();
using var reader = command.ExecuteReader();
while (reader.Read())
{
}
connection.Close();
}
private static SqlCommand CreateCommand(SqlConnection connection, string type, string parameterValue)
{
var command = connection.CreateCommand();
command.CommandText =
$"""
SELECT [s].[Id], [s].[Name]
FROM [BigData] AS [s]
WHERE [s].[Name] IN (
SELECT [c].[value]
FROM OPENJSON(@__codes_0) WITH ([value] {type} '$') AS [c])
""";
var parameter = command.Parameters.Add("@__codes_0", SqlDbType.NText, parameterValue.Length > 4000 ? -1 : 4000);
parameter.Value = parameterValue;
return command;
}
private static string CreateParameterValue(int count)
{
var builder = new StringBuilder();
builder.Append('[');
var i = 1;
for (; i < count; i++)
{
builder.Append(i.ToString(CultureInfo.InvariantCulture)).Append(',');
}
builder.Append(i);
builder.Append(']');
return builder.ToString();
}
}
I had an example where we believe this bug was causing data to be missing from queries. It was mentioned and partially documented in #32852. The daily as of (and perhaps before) January 17, 2024 resolves the issue.
I'm only commenting because it seems the running belief is that this issue only causes extra data but I have two examples where it is causing missing data.
Now that I know it's char(1) specific, I'll try to update my repro attempt to share with y'all just for further test cases and regression tests. Won't be today, though.