efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Consider using OPENJSON for our JSON scalar access

Open roji opened this issue 2 years ago • 3 comments

When using owned JSON entities, we currently use JSON_VALUE to extract scalars out of them. Since JSON_VALUE always returns nvarchar(4000), we apply a cast to the results based on the model type. For example:

_ = await ctx.Blogs.Where(b => b.Details.Foo == 8).ToArrayAsync();

... translates to:

SELECT [b].[Id], [b].[Name], JSON_QUERY([b].[Details],'$')
FROM [Blogs] AS [b]
WHERE CAST(JSON_VALUE([b].[Details],'$.Foo') AS int) = 8

There are issues with this translation:

  • Binary data (varbinary) cannot be extracted in this way: the standard JSON representation for binary data is base64, but a regular relational CAST doesn't do that (see comment). OPENJSON with WITH does decode base64 data, since it applies a JSON-specific conversion. Binary data specifically is covered by #33435.
    • On the other hand, spatial data cannot be converted with OPENJSON with WITH, only with a regular cast. The same may be true of hierarchyid.
    • We need to go through all supported SQL Server types
  • JSON_VALUE returns null for strings larger than 4000 (or can throw in strict mode) (see #29477).
  • There's good reason to believe that this translation is inefficient compared to OPENJSON with WITH, where the query supplies more information to SQL Server.
    • See this and this. Both these comments are about OPENJSON without WITH (and not about JSON_VALUE), and are in the context of primitive collections; but the same principles should hold here as well - needs to be confirmed.

The alternative translation would be to use OPENJSON with WITH instead:

SELECT [b].[Id], [b].[Name], JSON_QUERY([b].[Details],'$')
FROM [Blogs] AS [b]
CROSS APPLY OPENJSON([b].[Details]) WITH ([Foo] int '$.Foo') AS [d]
WHERE [d].[Foo] = 8

In other words, every scalar access adds a CROSS APPLY to an OPENJSON invocation, applying the JSON conversion with WITH.

When the property being extracted is known to be a short string in the model (fitting in nvarchar(4000)), we could keep the current JSON_VALUE translation. When the type being extracted is incompatible with OPENJSON with WITH (e.g. geometry), we could do OPENJSON without WITH and then apply a relational cast (we should avoid JSON_VALUE still it truncates).

Note: consider the indexability of this technique, compared to JSON_VALUE (with computed columns)

roji avatar May 27 '23 09:05 roji

@roji To check sargability.

ajcvickers avatar Jun 07 '23 20:06 ajcvickers

Consider only using openjson for types that are going to be truncated (nvarchar(max) and varbinary(max) or where we need the conversion to/from base64 (binarty and varbinary(x)).

ajcvickers avatar Jun 07 '23 20:06 ajcvickers

An improvement to SQL Server's JSON_VALUE() may be coming, where a RETURNING clause would allow specifying the returned type, with the conversion happening inside JSON_VALUE(). This would be both more efficient and resolve the above issues.

roji avatar Apr 27 '24 13:04 roji