efcore EF should not produce differing newlines in SQL based on the OS it happens to run on

Ask a question

The same query statement code, slow query on linux, but normal on windows, I use SQL Server Profiler to monitor the query statement, found that the statement is exactly the same, the only difference is the newline character is different, and the execution plan is also different.

Linux

Windows

Include provider and version information

EF Core version: 6.0.12 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 Operating system: win11 IDE: e.g. Visual Studio 2022 17.4

Apr 27 '23 07:04 Varorbc

@Varorbc as always, we need a code sample - two screenshot fragments are not enough to understand what's going on.

I'm also not sure this is related to EF; SQL Server should generally be able to accept either newlines. One thought that comes to mind is that EF should maybe not produce differing newlines in SQL based on the OS it happens to run on.

Apr 27 '23 07:04 roji

@roji I’m sorry, I don’t know what caused this issue, so I can’t provide an example code. The only thing I know now is the difference in newline characters

Apr 27 '23 07:04 Varorbc

@roji I also asked New Bing and it gave me the following answer. Does this have anything to do with it?

According to the information I found, the line breaks in the statement executed by sp_executesql are linux line breaks, which do not affect performance. However, if you use NVARCHAR type parameters, you may encounter implicit conversion and parameter sniffing issues that can reduce performance . You can try adding OPTION (RECOMPILE) at the end of the statement to avoid this problem. Also, you can consider using stored procedures instead of sp_executesql to improve maintainability and security .

Apr 27 '23 08:04 Varorbc

That sounds like it's maybe discussing line breaks inside parameter values, as opposed to the SQL itself. For the SQL itself, I'm guessing there's going to be a problem if you send the same query with different line breaks (triggering plan cache misses at the very least). But beyond that, as always, we need some code/repro...

Apr 27 '23 08:04 roji

Finally, I cleaned up the fragments. The first query was slow, but the rest were normal, and the execution plans were consistent.

Apr 27 '23 09:04 Varorbc

If extecution plans are different for the same query it can often be due to parameter sniffing causing a sub-optimal plan to be chosen.

Apr 27 '23 09:04 ErikEJ

The first query was slow, but the rest were normal, and the execution plans were consistent.

Yeah, that sounds like the above - the same SQL was possibly already cached in SQL Server with another line break, and so you had a cache miss. But then the SQL with the new line break got cached and you got normal perf again.

Apr 27 '23 09:04 roji

If extecution plans are different for the same query it can often be due to parameter sniffing causing a sub-optimal plan to be chosen.

One thought that comes to mind is that EF should maybe not produce differing newlines in SQL based on the OS it happens to run on.

I suspect the root cause is parameter sniffing. Is it possible to keep newlines consistent across operating systems?

Apr 27 '23 11:04 Varorbc

Can you confirm that you're running the same EF application on different client operation systems?

Apr 27 '23 12:04 roji

yeah.

Apr 27 '23 13:04 Varorbc

The query plan hash is a binary hash value calculated on the execution plan for a given query, and used to uniquely identify similar execution plans.

The query hash is a binary hash value calculated on the Transact-SQL text of a query, and is used to uniquely identify queries.

https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver16#cache-multiple-plans-for-the-same-query

@roji EF should use consistent line breaks across different operating systems

Apr 28 '23 07:04 Varorbc

Note from triage: consider always using Unix line endings, or possibly allowing it to be configured if always using Unix endings has too much of a negative impact when using Windows for viewing queries while debugging or in logs.

May 04 '23 08:05 ajcvickers

Does this apply to IQueryable.ToQueryString?

I'm having some issues with snapshot tests, and it would be nice if the newline sequence could be configured.

-   "executable": "SELECT \"a\".\"Id\", \"a\".\"Name\"\nFROM \"Authors\" AS \"a\""
+   "executable": "SELECT \"a\".\"Id\", \"a\".\"Name\"\r\nFROM \"Authors\" AS \"a\""
                                                      ^^^^

Jun 17 '24 11:06 glen-84

@ajcvickers @roji @ErikEJ

I'd like to take on this change if it's not already being worked by the EF team. I have a small proof-of-concept with some proposed design changes on its own fork here, but I definitely want input from the core team before proceeding any further than this. The changes in this branch do seem to work against the MSSQL provider for queries as well as ExecuteSqlRawAsync, although I haven't tested any further.

A few notes on these changes:

Line endings (at least for relational providers) are ultimately set within the IndentedStringBuilder helper class, which wraps a StringBuilder. Relational providers use this helper class via the RelationalCommandBuilder to output the commands ultimately sent to the data provider.
I added a new constructor to the IndentedStringBuilder that accepts a LineEnding enum. As IndentedStringBuilder may be used by many different providers, it probably makes sense to keep the current behavior of using Environment.NewLine to avoid potential downstream impacts.
I exposed the option to specify line endings to relational providers, adding a new configuration to the RelationalOptionsExtension. I'm not sure this scoping is correct: should all relational providers be expected to deal with this change, or should it be more narrowly scoped to specific providers. Alternatively, does this change make sense for non-relational providers and thus should be scoped higher, as part of the core options?
This POC covers at least the basic case of relational commands, which is probably the most important issue due to plan caches. Should this feature also address other command generation scenarios, such as migrations, logging, ToQueryString(), etc.?

Let me know if I should continue on this and any design considerations that must be addressed. Thank you!

Aug 13 '24 13:08 folkcoder

@folkcoder is there any particular reason why you think that the newline used should be a provider-specific decision, as opposed to something that's just implemented by EF without provider configurability?

I exposed the option to specify line endings to relational providers, adding a new configuration to the RelationalOptionsExtension.

Adding an option to RelationalOptionsExtension means that users can control that option, not providers.

Aug 13 '24 17:08 roji

@folkcoder is there any particular reason why you think that the newline used should be a provider-specific decision, as opposed to something that's just implemented by EF without provider configurability?

I exposed the option to specify line endings to relational providers, adding a new configuration to the RelationalOptionsExtension.

Adding an option to RelationalOptionsExtension means that users can control that option, not providers.

@roji I don't think that the newline used should be a provider-specific decision, but I do wonder whether exposing this option for users to control should be exposed for all providers. For example, let's say there's a provider for some theoretical relational database that automatically normalizes line endings before caching the execution plan, mitigating the need for consistency in line endings, or the provider doesn't cache the execution plans in the first place. Forcing that provider to expose this option doesn't really add a lot of value. I don't have a good enough understanding of the landscape of EF providers to speak intelligently on whether exposing this option makes sense universally across all relational providers.

I do think there is a case to be made to allow users to control this option rather than just setting a consistent default (e.g., Unix line endings everywhere rather than Environment.NewLine). My proposed changes are opt-in and default to the current behavior, avoiding breaking changes that consumers have may have come to rely on, on integration tests if nothing else. That said, as far as breaking changes go this one is fairly minor and having EF just choose an environment-agnostic default would certainly simplify any implementation.

I appreciate your questions on this issue! Thanks for engaging.

Aug 13 '24 17:08 folkcoder

For example, let's say there's a provider for some theoretical relational database that automatically normalizes line endings before caching the execution plan, mitigating the need for consistency in line endings, or the provider doesn't cache the execution plans in the first place. Forcing that provider to expose this option doesn't really add a lot of value.

I'm still not sure why just e.g. applying Unix line endings (or whatever) everywhere isn't good enough, or why a provider would want/need something else - even if they plans aren't cached or line endings are normalized at the database, EF normalizing to e.g. Unix doesn't hurt.

And in any case, even if there's some reason where a provider would want to change the newline behavior, they can always do that themselves by overriding the QuerySqlGenerator applying whatever transformation they want to the SQL produced by EF; I don't see a need to build in special extensibility for this, especially given this is all entirely theoretical at this point.

I do think there is a case to be made to allow users to control this option rather than just setting a consistent default (e.g., Unix line endings everywhere rather than Environment.NewLine).

What's the case for that? I'm not against allowing users to control things where there's a need, but I don't think it's a good idea to expose user knobs without a clear reason, "just in case".

Aug 14 '24 11:08 roji

What's the case for that? I'm not against allowing users to control things where there's a need, but I don't think it's a good idea to expose user knobs without a clear reason, "just in case".

The only use case was to avoid introduce a breaking change against current behavior, but if that's not a concern then I completely agree that it would reduce a lot of complexity for EF to just pick a consistent line ending type.

Aug 15 '24 19:08 folkcoder

The only use case was to avoid introduce a breaking change against current behavior

How would this be a breaking change?

Aug 17 '24 16:08 roji

How would this be a breaking change?

If you forced LF on all platforms, then systems relying on the presence of CRLF would fail (f.e. snapshot tests).

I disagree with using the same sequence on all platforms. If I'm working on a Windows project, then I don't want Unix LFs in my files.

If I'm working on a project that uses LFs (f.e. as set in an EditorConfig file), then I want to be able to tell EF to use LFs instead of CRLFs on my Windows system.

Aug 18 '24 11:08 glen-84

How would this be a breaking change?

If you forced LF on all platforms, then systems relying on the presence of CRLF would fail (f.e. snapshot tests).

That doesn't constitute a breaking change, as EF makes no guarantees on the stability of the SQL it generates. We make regular SQL changes in every release, producing more efficient even though that potentially breaks people asserting on those SQLs. I generally don't think that asserting on SQL in user projects is a good idea (similar to how it doesn't make much sense to assert on the machine code that the compiler produces), but If someone chooses to do so, it's their responsibility to update their tests when they upgrade an EF version. I certainly don't think we should refrain from improving our SQL just because it might break someone's snapshot tests.

If I'm working on a Windows project, then I don't want Unix LFs in my files.

That's a valid argument, though you always have the option of normalizing the SQL you get from EF before performing the assertion; that would allow you to keep using CRLF in your tests even if EF produces LF. That needs to be weighed against the plan cache misses that the platform-varying newlines currently create.

Aug 19 '24 12:08 roji

That's a valid argument, though you always have the option of normalizing the SQL you get from EF before performing the assertion

I was also thinking about generated migration files, although that's not part of this issue. Perhaps that would be handled differently.

Aug 19 '24 18:08 glen-84

For generated migration files, I can see even less how that could be a breaking change for someone?

Aug 19 '24 21:08 roji

I wasn't suggesting that it was a breaking change. The comment was regarding the use of the same EOL sequence on all platforms (I wouldn't want my migration files to use LF on Windows, unless I configured it that way).

Aug 20 '24 07:08 glen-84

@roji I think the key point here is not that it is a breaking change, but that it will be unwanted by a significant number people on Windows either legitimately (e.g. their editor doesn't support it) or not. I'm falling off the fence on the side of doing it, but I know we will piss off a whole load of Windows users, so I still have one hand holding onto that fence...

Aug 20 '24 08:08 ajcvickers

@glen-84 @ajcvickers understood; but I think in the context of this issue we should consider specifically SQL queries, and not discuss other places (migration files). SQL queries have the very specific problem that they are cached at the server, so the newline variations require replanning etc; they are also not (generally) stored as a file (unlike e.g. scaffolded migrations) or need to be viewed/edited by the user (the case of snapshot tests where the user stores them seems like a corner case exception to this, which can be addressed and shouldn't really affect our decision here.

One note: the plan caching problem hereis possibly SQL Server-specific; most other databases don't have an automatic SQL plan cache at the server. In the PG case, for example, since the caching happens at the client, varying newlines aren't a problem.

Another point is that if we do this, we may want to still log the SQL with the machine's newline (so the SQL we log and the SQL we actually send to the DB would differ slightly).

Aug 20 '24 15:08 roji

It's quite surprising that SQL Server doesn't normalize or ignore whitespace differences when generating a hash. I'm assuming that this has been tested/confirmed? If that's the case, it almost sounds like a bug.

To work around that, I agree with sending a fixed EOL sequence (or maybe even just replacing them with a space?) to SQL Server specifically.

Aug 20 '24 19:08 glen-84

It's quite surprising that SQL Server doesn't normalize or ignore whitespace differences when generating a hash. I'm assuming that this has been tested/confirmed? If that's the case, it almost sounds like a bug.

It makes sense to me - I would assume that the idea is to has SQL as directly as possible and avoid any preprocessing (such as newline normalization), since the point is precisely to use the cache to be as fast possible. Normalizing whitespace would add extra processing for all incoming SQL queries, just so that SQL pairs that differ only in newlines would share a cache entry. I definitely wouldn't expect SQL Server to make that particular tradeoff.

Aug 20 '24 23:08 roji

It has to generate a hash anyway, I wouldn't expect skipping or replacing one or two bytes while processing to have much of a performance impact, but I could be wrong.

It looks like the (now deprecated) MySQL query cache may also have been whitespace-sensitive.

For that reason, perhaps it makes sense to do the same thing for all providers, and perform normalization just before sending the SQL to the database server.

Aug 21 '24 19:08 glen-84

efcore efcore copied to clipboard

EF should not produce differing newlines in SQL based on the OS it happens to run on

Ask a question

Linux

Windows

Include provider and version information

efcore
efcore copied to clipboard