azure-sdk-for-net icon indicating copy to clipboard operation
azure-sdk-for-net copied to clipboard

[BUG] Async Deserialization of SearchDocument Fails

Open jeburge-msft opened this issue 1 year ago • 5 comments
trafficstars

Library name and version

Azure.Search.Documents 11.5.1

Describe the bug

I am experiencing an issue causing SearchClient.SearchAsync<T>() to fail when T is SearchDocument.

I am consistently receiving the following exception call stack when making certain requests to the SearchClient:

System.Text.Json.JsonException: The JSON value could not be converted to Azure.Search.Documents.Models.SearchDocument. Path: $ | LineNumber: 0 | BytePositionInLine: 20.
 ---> System.InvalidOperationException: Cannot skip tokens on partial JSON. Either get the whole payload and create a Utf8JsonReader instance where isFinalBlock is true or call TrySkip.
   at System.Text.Json.Utf8JsonReader.Skip()
   at Azure.Search.Documents.JsonSerialization.ReadSearchDocument(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, Nullable`1 recursionDepth)
   at Azure.Search.Documents.Models.SearchDocumentConverter.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   --- End of inner exception stack trace ---
   at System.Text.Json.ThrowHelper.ReThrowWithPath(ReadStack& state, Utf8JsonReader& reader, Exception ex)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonConverter jsonConverter, Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonReaderState& readerState, Boolean isFinalBlock, ReadOnlySpan`1 buffer, JsonSerializerOptions options, ReadStack& state, JsonConverter converterBase)
   at System.Text.Json.JsonSerializer.ContinueDeserialize[TValue](ReadBufferState& bufferState, JsonReaderState& jsonReaderState, ReadStack& readStack, JsonConverter converter, JsonSerializerOptions options)
   at System.Text.Json.JsonSerializer.ReadAllAsync[TValue](Stream utf8Json, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken)
   at Azure.Search.Documents.Models.SearchResult`1.DeserializeAsync(JsonElement element, ObjectSerializer serializer, JsonSerializerOptions options, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.Models.SearchResults`1.DeserializeAsync(Stream json, ObjectSerializer serializer, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchInternal[T](SearchOptions options, String operationName, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchInternal[T](String searchText, SearchOptions options, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchAsync[T](String searchText, SearchOptions options, CancellationToken cancellationToken)

I have determined the issue is in the custom JsonConverter SearchDocumentConverter that is added to the default JsonSerializerOptions in Azure.Search.Documents.JsonSerialization.

Specifically in ReadSearchDocument, there is the following code attempting to "skip" OData properties present in the Json being read (https://github.com/Azure/azure-sdk-for-net/blob/15094d080596ad61b8ce95f0f5235d9aacfd1a77/sdk/search/Azure.Search.Documents/src/Serialization/JsonSerialization.cs#L224):

// Ignore OData properties - we don't expose those on custom
// user schemas
if (!propertyName.StartsWith(Constants.ODataKeyPrefix, StringComparison.OrdinalIgnoreCase))
{
 object value = ReadSearchDocObject(ref reader, recursionDepth.Value - 1);
 doc[propertyName] = value;
}
else
{
 reader.Skip();
}

I believe that reader.Skip() is not the correct behavior to use in an async context, and instead we should use the following logic:

if (!reader.TrySkip())
    throw new SomeJsonException("TrySkip failed on OData property.");

Although these implementations seem somewhat equivalent, in reality the behavior is quite different as discussed in: https://stackoverflow.com/questions/63038334/how-do-i-handle-partial-json-in-a-jsonconverter-while-using-deserializeasync-on.

Looking into the System.Text.Json.Utf8JsonReader code we see the following (https://github.com/dotnet/runtime/blob/486142a4b87ed6e3134bd1a8726156fb3f55157a/src/libraries/System.Text.Json/src/System/Text/Json/Reader/Utf8JsonReader.cs):

        //
        // Summary:
        //     Skips the children of the current JSON token.
        //
        // Exceptions:
        //   T:System.InvalidOperationException:
        //     The reader was given partial data with more data to follow (that is, System.Text.Json.Utf8JsonReader.IsFinalBlock
        //     is false).
        //
        //   T:System.Text.Json.JsonException:
        //     An invalid JSON token was encountered while skipping, according to the JSON RFC.
        //     -or- The current depth exceeds the recursive limit set by the maximum depth.
        public void Skip()
        {
            if (!_isFinalBlock)
            {
                throw ThrowHelper.GetInvalidOperationException_CannotSkipOnPartial();
            }

            SkipHelper();
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private void SkipHelper()
        {
            if (TokenType == JsonTokenType.PropertyName)
            {
                bool flag = Read();
            }

            if (TokenType == JsonTokenType.StartObject || TokenType == JsonTokenType.StartArray)
            {
                int currentDepth = CurrentDepth;
                do
                {
                    bool flag2 = Read();
                }
                while (currentDepth < CurrentDepth);
            }
        }
        //
        // Summary:
        //     Tries to skip the children of the current JSON token.
        //
        // Returns:
        //     true if there was enough data for the children to be skipped successfully; otherwise,
        //     false.
        //
        // Exceptions:
        //   T:System.Text.Json.JsonException:
        //     An invalid JSON token was encountered while skipping, according to the JSON RFC.
        //     -or - The current depth exceeds the recursive limit set by the maximum depth.
        public bool TrySkip()
        {
            if (_isFinalBlock)
            {
                SkipHelper();
                return true;
            }

            return TrySkipHelper();
        }

        private bool TrySkipHelper()
        {
            Utf8JsonReader utf8JsonReader = this;
            if (TokenType != JsonTokenType.PropertyName || Read())
            {
                if (TokenType != JsonTokenType.StartObject && TokenType != JsonTokenType.StartArray)
                {
                    goto IL_0042;
                }

                int currentDepth = CurrentDepth;
                while (Read())
                {
                    if (currentDepth < CurrentDepth)
                    {
                        continue;
                    }

                    goto IL_0042;
                }
            }

            this = utf8JsonReader;
            return false;
IL_0042:
            return true;
        }

Interestingly, the Skip() method does not even attempt the "skip" action if we are not at the final block. In an async context, it is likely that we have not finalized the block at the time we are reading and skipping some OData property in the returned SearchDocument JSON. There should be plenty of times that the OData property is "skippable" even when the block is not finalized. There could be some discussion on whether the bug truly lies in the Azure.Search.Documents code or the System.Text.Json code, but the interaction between the two at the moment certainly seems incorrect.

Expected behavior

Correctly deserialize and return the SearchDocuments when making a call to searchClient.SearchAsync<SearchDocument>(options, cancellationToken) (or in fact any async code that invokes ReadSearchDocument on the returned HTTP response from the search service that contains OData properties like @search.score).

Actual behavior

Receive the following exception:

System.Text.Json.JsonException: The JSON value could not be converted to Azure.Search.Documents.Models.SearchDocument. Path: $ | LineNumber: 0 | BytePositionInLine: 20.
 ---> System.InvalidOperationException: Cannot skip tokens on partial JSON. Either get the whole payload and create a Utf8JsonReader instance where isFinalBlock is true or call TrySkip.
   at System.Text.Json.Utf8JsonReader.Skip()
   at Azure.Search.Documents.JsonSerialization.ReadSearchDocument(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, Nullable`1 recursionDepth)
   at Azure.Search.Documents.Models.SearchDocumentConverter.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   --- End of inner exception stack trace ---
   at System.Text.Json.ThrowHelper.ReThrowWithPath(ReadStack& state, Utf8JsonReader& reader, Exception ex)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonConverter jsonConverter, Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](JsonReaderState& readerState, Boolean isFinalBlock, ReadOnlySpan`1 buffer, JsonSerializerOptions options, ReadStack& state, JsonConverter converterBase)
   at System.Text.Json.JsonSerializer.ContinueDeserialize[TValue](ReadBufferState& bufferState, JsonReaderState& jsonReaderState, ReadStack& readStack, JsonConverter converter, JsonSerializerOptions options)
   at System.Text.Json.JsonSerializer.ReadAllAsync[TValue](Stream utf8Json, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken)
   at Azure.Search.Documents.Models.SearchResult`1.DeserializeAsync(JsonElement element, ObjectSerializer serializer, JsonSerializerOptions options, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.Models.SearchResults`1.DeserializeAsync(Stream json, ObjectSerializer serializer, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchInternal[T](SearchOptions options, String operationName, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchInternal[T](String searchText, SearchOptions options, Boolean async, CancellationToken cancellationToken)
   at Azure.Search.Documents.SearchClient.SearchAsync[T](String searchText, SearchOptions options, CancellationToken cancellationToken)

Reproduction Steps

Invoke searchClient.SearchAsync<SearchDocument>(options, cancellationToken) on a search index that contains data with non-trivial document length. This exception will be encountered when the racing of block finalization vs the skipping of OData properties results in a non-finalized block when reader.Skip() is invoked. Note that because this is a racing issue that it does not repro reliably.

Environment

No response

jeburge-msft avatar Dec 13 '23 19:12 jeburge-msft

Thank you for your feedback. Tagging and routing to the team member best able to assist.

jsquire avatar Dec 13 '23 21:12 jsquire

I've mitigated the issue temporarily for my use case and confirmed this problem can be avoided by using synchronous calls for whenever deserializing to SearchDocument is needed instead of async calls, but it would be great to unblock the async method calls to prevent blocking.

jeburge-msft avatar Dec 14 '23 22:12 jeburge-msft

We are experience the same issue - from a batch of reading e.g. 33.000 documents around 16 of those could not be read async - so mitigating currently by doing sync call aswell, but would be nice to avoid this.

Looking forward for a bug fix.

perkops avatar Feb 16 '24 13:02 perkops

Pinging to follow up on this thread as it has been over 6 months.

jeburge-msft avatar Aug 01 '24 22:08 jeburge-msft