Javascript.NodeJS icon indicating copy to clipboard operation
Javascript.NodeJS copied to clipboard

Infinite Loop when using ProcessRetries.

Open johnrom opened this issue 2 years ago • 7 comments

6.3.0 was the first version I used, and the previous infinite loop was supposed to fixed in 6.2.0. I would guess this has to do with a specific sequence of operations across concurrent processes.

Infinite retries should ideally be moved from a value of -1 to a different configuration item like RetryType.Infinite or something, because this is an infinite loop, and the possibility of subtracting an int and resulting in an infinite loop is very dangerous. What appeared to happen in my case is that the entire site slowed to a crawl because I had a configuration of 1 process retry for each request. I think the issue is that when a process is changed by another concurrent thread, there is a TaskCanceledException, and that causes the while loop to continue for a given request indefinitely instead of failing after 1 retry.

Here was my configuration:

services.Configure<OutOfProcessNodeJSServiceOptions>(options => {
    options.TimeoutMS = 3000;
    options.NumRetries = 0;
    options.NumProcessRetries = 1;
});

Requests came in something like this. We use Server Side Rendering (SSR) on a React app once per page per ~1 hour to build the initial view.

- Request1 SSR attempt 1 - fail
 - R2 SSR attempt 1 - fail
  - R1 SSR attempt 2 (new process)
   - R2 SSR attempt 2 (new process)
    - R1 SSR canceled due to new process, SSR attempt 3 (retry infinite)
     - R3 SSR attempt 1 - fail
      - R3 SSR attempt 2 (new process)
       - R2 SSR canceled due to new process, SSR attempt 3 (retry infinite)
        - R1 SSR attempt 4 success on R3's new process
         - R2 fails multiple times before succeeding because of R4's new process
          - R3 fails multiple times before succeeding because of R5's new process
           - and so on

This particular process should run an SSR once, then cache the result until data is updated hourly-ish, so in general the SSR is called a few thousand times per hour. I worked around this issue by removing retries and combining Polly's CircuitBreaker w/ MoveToNewProcess.

In my case, it's OK for an SSR to fail, because we only need it for SEO and from the client point of view the site just takes a little longer to build without SSR.

Here is a log of a single request if it helps. This particular request ended up retrying 6 times before eventually succeeding (note configuration above, there are 0 retries for the existing process configured and 1 process retry. However, the entire site was unresponsive because every request was looping indefinitely. Notice how the traces only have one entry for "Process Retries Remaining: 1", meaning this particular request only actually performed one MoveToNewProcess, then retried infinitely until success on new processes which were generated by other requests.

REQUEST
Response code: 200
Response time: 2.0 mins
Id: ec883f48b673eddf
-
DEPENDENCY
127.0.0.1:49414
Name: POST /
Duration: 3.8 s
Call status: False
Parent Id: ec883f48b673eddf
-
TRACE
An invocation attempt failed. Retries using existing process remaining: 0. Exception: System.Threading.Tasks.TaskCanceledException: The operation was canceled. at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
Severity level: Warning
-
TRACE
Retries in existing process exhausted. Process retries remaining: 1.
Severity level: Warning
-
DEPENDENCY
127.0.0.1:49511
Name: POST /
Duration: 10.1 s
Call status: False
Parent Id: ec883f48b673eddf
-
TRACE
An invocation attempt failed. Retries using existing process remaining: infinity. Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
Severity level: Warning
-
DEPENDENCY
127.0.0.1:49608
Name: POST /
Duration: 11.3 s
Call status: False
Parent Id: ec883f48b673eddf
-
TRACE
An invocation attempt failed. Retries using existing process remaining: infinity. Exception: System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request.. ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request. --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token) at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
Severity level: Warning
-
DEPENDENCY
127.0.0.1:49739
Name: POST /
Duration: 10.7 s
Call status: False
Parent Id: ec883f48b673eddf
-
TRACE
An invocation attempt failed. Retries using existing process remaining: infinity. Exception: System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request.. ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request. --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token) at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
Severity level: Warning
-
DEPENDENCY
127.0.0.1:49825
Name: POST /
Duration: 3.0 s
Call status: False
-
TRACE
An invocation attempt failed. Retries using existing process remaining: infinity. Exception: System.Net.Http.HttpRequestException: Error while copying content to a stream. ---> System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.. ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host. at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.CreateException(SocketError error, Boolean forAsyncThrow) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.SendAsyncForNetworkStream(Socket socket, CancellationToken cancellationToken) at System.Net.Sockets.NetworkStream.WriteAsync(ReadOnlyMemory`1 buffer, CancellationToken cancellationToken) at System.Net.Http.HttpConnection.WriteToStreamAsync(ReadOnlyMemory`1 source, Boolean async) at System.Net.Http.HttpConnection.FlushAsync(Boolean async) at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source, Boolean async) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source, Boolean async) at System.Net.Http.HttpConnection.ChunkedEncodingWriteStream.<WriteAsync>g__WriteChunkAsync|3_0(HttpConnection connection, ReadOnlyMemory`1 buffer) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at System.Net.Http.HttpConnection.ChunkedEncodingWriteStream.<WriteAsync>g__WriteChunkAsync|3_0(HttpConnection connection, ReadOnlyMemory`1 buffer) at System.Net.Http.HttpConnection.ChunkedEncodingWriteStream.WriteAsync(ReadOnlyMemory`1 buffer, CancellationToken ignored) at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at System.Text.Json.JsonSerializer.SerializeAsync[TValue](Stream utf8Json, TValue value, JsonSerializerOptions options, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.JsonService.SerializeAsync[T](Stream stream, T value, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.InvocationContent.SerializeToStreamAsync(Stream stream, TransportContext context) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at Jering.Javascript.NodeJS.InvocationContent.SerializeToStreamAsync(Stream stream, TransportContext context) at System.Net.Http.HttpContent.SerializeToStreamAsync(Stream stream, TransportContext context, CancellationToken cancellationToken) at System.Net.Http.HttpContent.InternalCopyToAsync(Stream stream, TransportContext context, CancellationToken cancellationToken) at System.Net.Http.HttpContent.CopyToAsync(Stream stream, TransportContext context, CancellationToken cancellationToken) at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task`1.TrySetResult(TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result) at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext() at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox box, Boolean allowInlining) at System.Threading.Tasks.Task.RunContinuations(Object continuationObject) at System.Threading.Tasks.Task.FinishContinuations() at System.Threading.Tasks.Task`1.TrySetResult(TResult result) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result) at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1.SetResult(TResult result) at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s) at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread) at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecuteFromThreadPool(Thread threadPoolThread) at System.Threading.ThreadPoolWorkQueue.Dispatch() at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() at System.Threading.Thread.StartCallback() --- End of stack trace from previous location --- --- End of inner exception stack trace --- at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source, Boolean async) at System.Net.Http.HttpConnection.ChunkedEncodingWriteStream.<WriteAsync>g__WriteChunkAsync|3_0(HttpConnection connection, ReadOnlyMemory`1 buffer) at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.InvocationContent.SerializeToStreamAsync(Stream stream, TransportContext context) at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask) --- End of inner exception stack trace --- at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask) at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken) at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
-
DEPENDENCY
127.0.0.1:49888
Name: POST /
Duration: 4.4 s
Call status: False
Parent Id: ec883f48b673eddf
-
TRACE
An invocation attempt failed. Retries using existing process remaining: infinity. Exception: System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request.. ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request. --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token) at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at Jering.Javascript.NodeJS.HttpNodeJSService.TryInvokeAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken) at Jering.Javascript.NodeJS.OutOfProcessNodeJSService.TryInvokeCoreAsync[T](InvocationRequest invocationRequest, CancellationToken cancellationToken)
-
DEPENDENCY
127.0.0.1:50094
Name: POST /
Duration: 641 ms
Call status: True

johnrom avatar Apr 19 '22 15:04 johnrom

Actually, I think it's just because I didn't want any retries on the existing process -- I only wanted 1 retry with a new process. However, when using a new process the service actually sets the NumRetries to config.NumRetries - 1 which, in my case, 0 - 1 = -1, an infinite loop.

https://github.com/JeringTech/Javascript.NodeJS/blob/9a050cd4444eeaf3be264e994d8644405c0ff41d/src/NodeJS/NodeJSServiceImplementations/OutOfProcess/OutOfProcessNodeJSService.cs#L330-L331

So:

- numRetries = _numRetries - 1; 
+ numRetries = _numRetries > 0 ? _numRetries - 1 : _numRetries; 

johnrom avatar Apr 19 '22 16:04 johnrom

Ah you're right, the _numRetries = 0 case is not handled correctly.

I'm thinking of making the following updates, together with your fix. Current:

https://github.com/JeringTech/Javascript.NodeJS/blob/4c44c07a018e62d7c0c8fa4c9bfda60a0ad77afa/src/NodeJS/NodeJSServiceImplementations/OutOfProcess/OutOfProcessNodeJSServiceOptions.cs#L27-L40

Updated:

+        /// <summary>Renamed to NumRetryProcesses to improve clarity.</summary>
+        [Obsolete("Renamed to NumRetryProcesses to improve clarity.", true)]
+        public int NumProcessRetries { get; set; } = 1;
+
-        /// <summary>The number of new NodeJS processes created to retry an invocation.</summary>
+        /// <summary>The number of NodeJS processes created to retry an invocation.</summary>
         /// <remarks>
         /// <para>A NodeJS process retries invocations <see cref="NumRetries"/> times. Once a process's retries are exhausted,
-        /// if any <b>process retries</b> remain, the library creates a new process that then retries invocations <see cref="NumRetries"/> times.</para>
+        /// if any <b>retry-processes</b> remain, the library creates a new process and retries invocations <see cref="NumRetries"/> times.</para>
         /// <para>For example, consider the situation where <see cref="NumRetries"/> and this value are both 1. The existing process first attempts the invocation.
         /// If it fails, it retries the invocation once. If it fails again, the library creates a new process that retries the invocation once. In total, the library
-        /// attempt the invocation 3 times.</para>
+        /// attempts the invocation 3 times.</para>
         /// <para>If this value is negative, the library creates new NodeJS processes indefinitely.</para>
+        /// <para>If this value is larger than 0 and <see cref="NumRetries"/> is 0, the invocation is retried once in each new process.</para>
         /// <para>By default, process retries are disabled for invocation failures caused by javascript errors. See <see cref="EnableProcessRetriesForJavascriptErrors"/> for more information.</para>
         /// <para>If the module source of an invocation is an unseekable stream, the invocation is not retried.
         /// If you require retries for such streams, copy their contents to a <see cref="MemoryStream"/>.</para>
         /// <para>Defaults to 1.</para>
         /// </remarks>
-        public int NumProcessRetries { get; set; } = 1;
+        public int NumRetryProcesses { get; set; } = 1;

I think NumRetryProcesses wasn't designed well. Thoughts on whether these updates will suffice design-wise?

If you're interested, feel free to open a pull request for the fix you've suggested.

JeremyTCD avatar Apr 20 '22 01:04 JeremyTCD

Honestly, I think that Process retries are happening in the wrong place and it makes more sense to manage the lifecycle outside of the invocation attempt, like at Policy or ServiceFactory level. For example, in HttpClient, client/connection management would happen at the HttpClientFactory level. I would look into the HttpClient code for .net 6 and see how they manage connections, retries, etc, and try and adopt a similar method.

INodeJsServiceFactory NodeJsServiceFactory based on HttpClientFactory AddPolicyHandler based on Polly+HttpClient NodeJsClient would be a transient which just requests the policy to invoke the action.

Then the API would be something like:

// pseudo-code
AddNodeJs = () => {
   AddSingleton<INodeJsServiceFactory, NodeJsServiceFactory>()
     .AddPolicyHandler(policyBuilder => {
       policyBuilder.AddRetries(new NodeJsServicePolicyConfig {
          // dangerous infinite loop goes here
          // this infinite loop doesn't spawn new processes on its own, 
          // but NewProcessAfterFailedInvocationCount would in the background
          RetryIndefinitely = false,
          // retries do not manage new process, never negative
          RetryCount = 5,

          // if every a new process should be spawned as a last resort for every attempt 
          // (unless NewProcessOnFailedAttemptTimeout is not met)
          // this will never happen for RetryIndefinitely, instead relying on NewProcessAfterFailedInvocationCount
          TryNewProcessOnInvocationFailure = true,

          // how many retries AFTER last ditch effort to spawn new process
          RetryCountAfterNewProcess = 1,

          // the service will detect failures from all requests and count them (including indefinite retries)
          // it would MoveToNewProcess after X failed invocations
          NewProcessAfterFailedInvocationCount = 5,

          // make sure a certain timeout has passed before spawning new process, so it doesn't happen infinitely
          NewProcessOnFailedAttemptTimeout = 30s
       });
     });
  AddTransient<INodeJsClient, NodeJsClient>();
};

NodeJsClient.InvokeAsync = (policies, args) => {
  policies.InvokeAsync(args);
}

Policies.InvokeAsync = (config, args) => {
     var retryCount = config.RetryCount;

     while (!success & (config.RetryIndefinitely || retryCount > 0)) {
       // factory will automatically try creating a new service after NewProcessOnFailedAttemptCount failed invocations
       factory.GetService().InvokeAsync(args);
       
       if (!retryIndefinitely) {
         retryCount--;
       }
     }

     // it will never make it here if RetryIndefinitely
     // TryMoveToNewProcess will return false if NewProcessOnFailedAttemptTimeout criteria is not met
     if (!success && TryNewProcessOnInvocationFailure && factory.TryMoveToNewProcess()) {
       // the first try is technically a retry
       var retryCount = 1 + RetryCountAfterNewProcess;

       while (retryCount > 0) {
         factory.GetService().InvokeAsync(args);
       }
     }
}

In order to support concurrency one would need to also implement something like the following from DefaultHttpClientFactory:

_activeHandlers = new ConcurrentDictionary<string, Lazy<ActiveHandlerTrackingEntry>>(StringComparer.Ordinal);

johnrom avatar Apr 20 '22 15:04 johnrom

Yeah makes sense, appreciate the suggestion. Agree that retry logic should be extracted from OutOfProcessNodeJSService, and that it should be customizable. Also agree with conforming to framework norms.

The main issue is development time. Replicating HttpClient's policy system, complete with tests and docs, looks like a 2 week task. Any interest in taking it on?

As an alternative, perhaps we can use HttpClient's existing system. Creating a HttpClient policy that replicates current behavior (with fixes) seems doable. Users can customize this policy using DI. Minimal API changes (maybe addition of some options), minimal extra logic to test. If that works out, when there is more time we could add a new NodeJSClient transient type for different policies per NodeJS client. Thoughts?

JeremyTCD avatar Apr 25 '22 04:04 JeremyTCD

I don't really have the time to do much more than this at the moment. Hoping that the Polly integration will solve our issue for now. I may be able to circle back in a few weeks.

I have an example of integrating IOptions + Polly + HttpClient here, (policies), (usage), if you add a Polly.Context and onRetry callbacks, someone might be able to implement it pretty easily, similar to the refresh-authorization example here:

https://www.jerriepelser.com/blog/refresh-google-access-token-with-polly/

johnrom avatar Apr 25 '22 20:04 johnrom

Okay thanks for all the tips. I took a look at your examples and tried using a HttpClient policy. Ran into a couple of issues:

  • Interactions with OutOfProcessNodeJSService._connectingLock complicate things.
  • We'd need separate retry logic for other protocols (if we decide to add any, like web sockets). Currently OutOfProcessNodeJSService's retries are shared by all protocols.

So your original suggestion is probably the best solution. If anyone can find time they should attempt that.

For now let's leave this issue open. I will merge a fix for the _numRetries = 0 case in the next few weeks. If you'd like to be a contributor feel free to open a PR with your fix. I'll handle the tests etc.

JeremyTCD avatar Apr 28 '22 13:04 JeremyTCD

Sure, fix is here:

https://github.com/JeringTech/Javascript.NodeJS/pull/135

johnrom avatar Apr 28 '22 14:04 johnrom