Fix unpredictable behaviour with pagination resolver
Why make this change?
- Closes #2281
- Unpredictable behavior was noticed with pagination fields.
What's causing this issue?
-
ArrayPoolWriteruses pooled memory, which can lead to issues if not managed correctly. If the buffer is not properly sized or if there is a memory leak, it can cause errors when handling larger datasets as we are seeing in our case when we try to get more number of rows in a single query after a certain value, it starts to fail due to loss of "hasNext" property. - Also, if the buffer is not correctly advanced or if there are issues with the span management, it can cause the JSON parsing to fail.
- While
ArrayPoolWritermight offer some performance benefits, it introduces more complexities as well.JsonDocument.Parseinherently handles some edge cases that might be missed when usingArrayPoolWriterand other low-level operations likeadvancedandspan.
What is this change?
- Updating the resolver to use JsonDocument to handle memory management internally than using ArrayPoolWriter.
- The process of encoding and decoding the JSON string using
Encoding.UTF8.GetBytesandUtf8JsonReaderwas causing errors, especially with larger or more complex JSON structures. Any discrepancy in byte count or encoding was leading to discrepancies. - So updated to use a simpler and more straightforward approach, leveraging
JsonDocument.
How was this tested?
- [x] Integration Tests
Sample Request(s)
Before
After
/azp run
/azp run
/azp run
/azp run
/azp run
/azp run
/azp run
/azp run
Reviewing will be easier if you can also provide details about:
ArrayPoolWriter uses pooled memory, which can lead to issues if not managed correctly. If the buffer is not properly sized or if there is a memory leak, it can cause errors when handling larger datasets as we are seeing in our case when we try to get more number of rows in a single query after a certain value, it starts to fail due to loss of "hasNext" property.
Can you provide examples of the issues are you experiencing in your PR description? By reverting back to JsonDocuments for all the results, you're allocating memory for each of those even when using "using." Michael's intent here was to avoid all those allocations.
- Are there uncaught/unhandled silent errors occurring in DAB's
ArrayPoolWriter.csclass? - Perhaps breakpoints there can show us where there might be capacity issues when renting a buffer from the array pool? e.g.
ArrayPool<byte>.Shared.Rent(size);
In your test repro, stepping through debugger,
ExecutionHelper::ExecuteLeafField(IPureResolverContext context) L140
- place conditional breakpoint :
((NamePathSegment)context.Path).Name.Value == "hasNextPage"- Issue is that hasNextPage resolves to null. However, stepping through another iteration, "endCursor" resolves fine. I'm not sure why the arraypoolwriter would result in just "hasNextPage" being null. Perhaps the ExecutionHelper/ResolverTypeInterceptor has a bug somewhere.
Here:
For whatever reason, all the JSON from the result set is captured without issue... however, hasNextPage is not persisted to the json object. Perhaps there is a missing function call or a function is operating on the wrong json document.
In your test repro, stepping through debugger,
ExecutionHelper::ExecuteLeafField(IPureResolverContext context)L140
place conditional breakpoint :
((NamePathSegment)context.Path).Name.Value == "hasNextPage"
- Issue is that hasNextPage resolves to null. However, stepping through another iteration, "endCursor" resolves fine. I'm not sure why the arraypoolwriter would result in just "hasNextPage" being null. Perhaps the ExecutionHelper/ResolverTypeInterceptor has a bug somewhere.
Here:
For whatever reason, all the JSON from the result set is captured without issue... however, hasNextPage is not persisted to the json object. Perhaps there is a missing function call or a function is operating on the wrong json document.
the problem is the bug is inconsistent, i.e, it happens for certain page size only which is also not fixed, because of which it's hard to debug. The cause that i mentioned is the best guess i could make as ArrayPoolWriter uses low level handling, as it introduces more complexity. while JsonDocument is taking care of whatever that is missing from the ArrayPoolWriter Implementation.
More notes from investigating. Seems there might be issue with how resolver is written.
Given the following request:
query myQuery {
supportedTypes(first: 4) {
items {
typeid
boolean_types
byte_types
bytearray_types
date_types
datetime2_types
datetime_types
datetimeoffset_types
decimal_types
}
hasNextPage
}
}
Pure resolvers are executed for:
-
items-> ResolverTypeInterceptor::_listFieldResolver => ExecutionHelper.ExecuteListField(ctx)- fields for each item ResolverTypeInterceptor::_leafFieldResolver => ExecutionHelper.ExecuteLeafField(ctx);
-
hasNextPage-> ResolverTypeInterceptor::_leafFieldResolver => ExecutionHelper.ExecuteLeafField(ctx);
Depending on order of fields written in query:
-
itemsbeforehasNextPage-> items result overwrites pureresolvercontext -
itemsafterhasNextPage-> result as expected.
HotChocolate.Execution.Processing.MiddlewareContext.IsResultModified = true HotChocolate.Execution.Processing.MiddlewareContext.Result -> only contains resolved value of "items" field and "endCursor", if included in query.
Consequentially, for order bullet point 1, the PureResolverContext passed to the leafFieldResolver no longer contains the key/value pair for "hasNextPage:true/false"
thus, the value is set to null in the ResultMap built by HC::ResolverTaskFactory::EnqueueOrInlineResolverTasks(...)
I acknowledge in HC docs: "Because of this it is important that resolvers, with the exception of top level mutation field resolvers, do not contain side-effects, since their execution order may vary." However, our pure resolvers shouldn't have any side effects as they are just parsing the json result from the database.
after more investigation: I think I found another fix.. which only involves ensuring we "JsonElement.Clone()" in at least 1 more place.
Here where we write the context result, this is what I was observing to be overwritten in the cases where we see "repro"
At least a few exercises of trying to repro result in the error no longer showing. IF this is the solution, then the error was due to timing of when "jsondocument" disposal occurred, which ironically is what line 257 is supposed to be doing for us. Perhaps the disposal occurred before we were finished using the value from the jsondocument, thus needing to ensure copying the root element
/azp run
/azp run
/azp run
Test pipelines failed for at least 2 iterations with the following error messages:
at Azure.DataApiBuilder.Service.Tests.Configuration.ConfigurationTests.ValidateMutationSucceededAtDbLayer(TestServer server, HttpClient client, String query, String queryName, String auth***, String clientRoleHeader) in D:\a\1\s\src\Service.Tests\Configuration\ConfigurationTests.cs:line 2750
Unhandled exception. System.InvalidOperationException: The requested operation requires an element of type 'Object', but the target element has type 'Null'. at System.Text.Json.ThrowHelper.ThrowJsonElementWrongTypeException(JsonType expectedType, JsonType actualType) at System.Text.Json.JsonDocument.TryGetNamedPropertyValue(Int32 index, ReadOnlySpan
1 propertyName, JsonElement& value) at System.Text.Json.JsonElement.TryGetProperty(String propertyName, JsonElement& value) at Azure.DataApiBuilder.Service.Tests.Configuration.ConfigurationTests.ValidateMutationSucceededAtDbLayer(TestServer server, HttpClient client, String query, String queryName, String auth***, String clientRoleHeader) in D:\a\1\s\src\Service.Tests\Configuration\ConfigurationTests.cs:line 2750 at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() The active test run was aborted. Reason: Test host process crashed : Unhandled exception. System.InvalidOperationException: The requested operation requires an element of type 'Object', but the target element has type 'Null'. at System.Text.Json.ThrowHelper.ThrowJsonElementWrongTypeException(Json***Type expectedType, Json***Type actualType) at System.Text.Json.JsonDocument.TryGetNamedPropertyValue(Int32 index, ReadOnlySpan1 propertyName, JsonElement& value) at System.Text.Json.JsonElement.TryGetProperty(String propertyName, JsonElement& value) at Azure.DataApiBuilder.Service.Tests.Configuration.ConfigurationTests.ValidateMutationSucceededAtDbLayer(TestServer server, HttpClient client, String query, String queryName, String auth***, String clientRoleHeader) in D:\a\1\s\src\Service.Tests\Configuration\ConfigurationTests.cs:line 2750 at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state) at System.Threading.QueueUserWorkItemCallback.Execute() at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
For whatever reason, all the JSON from the result set is captured without issue... however, hasNextPage is not persisted to the json object. Perhaps there is a missing function call or a function is operating on the wrong json document.