Stop escaping Unicode characters unnecessarily in relational JSON
The JSON stored in a SQL Server column always has its Unicode characters escaped. For example, this code:
context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } } );
await context.SaveChangesAsync();
Results in the following JSON in the database:
{"Be":"F\u00FCz\u00E9r Castle in the Zempl\u00E9n Mountains"}
However, there is nothing preventing the JSON being stored as:
{"Be":"Füzér Castle in the Zemplén Mountains"}
EF reads the correct string back in both cases, but other tools may not be expecting escaped JSON for all Unicode characters.
Test code:
using (var context = new SomeDbContext())
{
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();
context.Add(new Customer { Json = new MyJson { Be = "Füzér Castle in the Zemplén Mountains" } });
await context.SaveChangesAsync();
}
using (var context = new SomeDbContext())
{
foreach (var c in context.Set<Customer>().ToList())
{
Console.WriteLine(c.Json.Be);
}
}
public class SomeDbContext : DbContext
{
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(@"Data Source=(LocalDb)\MSSQLLocalDB;Database=AllTogetherNow")
.LogTo(Console.WriteLine, LogLevel.Information)
.EnableSensitiveDataLogging();
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>().OwnsOne(e => e.Json).ToJson();
}
}
public class Customer
{
public int Id { get; set; }
public MyJson Json { get; set; }
}
public class MyJson
{
public string? Be { get; set; }
}
related: #30315
Here's the relevant docs page on encoding/escaping in System.Text.Json; we may want to specify JavaScriptEncoder.UnsafeRelaxedJsonEscaping.
However, for people using non-Unicode encodings in the database, or wanting to interact with the JSON documents from other tools which may not support special chars, we should probably have some sort of mechanism for specifying the encoding/escaping policy; this basically boils down to allowing users to specify the JsonSerializerOptions (which has this one it).
Maybe it makes sense to keep the current behavior as the default (as that's the S.T.Json default), and just tell people to specify UnsafeRelaxedJsonEscaping via JsonSerializerOptions if they want unescaped chars.
related: https://github.com/dotnet/efcore/issues/33443
We also escape all unicode chars in JSON property names, however when building a JSON path for query or update we don't escape those chars. This leads to us not fetching/updating the requested JSON fragments
Note that we'd need to not escape - and deal with non-escaped JSON data - everywhere in our stack (SaveChanges, JSON path generation, shaper - see #33771.
As in https://github.com/dotnet/efcore/pull/33771#pullrequestreview-2092452772, we should consider doing this for 9.0 as well, since #33771 is a breaking change for people querying externally-inputted JSON data that isn't escaped.
Note for the implementer: (as pointed out by @Charlieface here: https://github.com/dotnet/efcore/issues/33443#issuecomment-2140160687) when we allow un-escaped JSON strings, we need to escape single and double quotes and also pre-pend N to the JSON path string.
Design discussion:
- We'll indeed try to get this in for 9.0, along with #33771 (but not critical).
- We won't make any effort to support both escaped and non-escaped JSON content in the same database. For example, a single context option is probably sufficient as an application-wide setting, as opposed to e.g. a column-by-column setting.
Sorry to chime in here. Currently in EF Core 8, there is no way we can persist Unicode characters in non-escaped form for JSON columns? There is no easy workaround, correct? If this feature makes it to EF9, any plans to backport it to EF8 too? Or we'll be forced to update to .NET 9 and use EF9?