azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

Investigate if the function dispatcher state is getting properly reset

Open liliankasem opened this issue 3 years ago • 8 comments

Investigate if the function dispatcher state is getting properly reset; there is potentially a bug here and we should look into which potential scenarios/conditions are happening where we do not reset the state (i.e. restart use case)

liliankasem avatar Feb 23 '22 23:02 liliankasem

@liliankasem will be following up with additional context

fabiocav avatar Mar 09 '22 21:03 fabiocav

@liliankasem do we need more information for this? Is this about the channel state? Something else?

fabiocav avatar Mar 23 '22 20:03 fabiocav

@liliankasem do we need more information for this? Is this about the channel state? Something else?

Yup, still need to follow up on why I opened this issue; will prioritize getting the info before the next triage meeting!

liliankasem avatar Mar 23 '22 21:03 liliankasem

@fabiocav I did some digging into this and I think I have more context!

This issue is related to a couple of issues regarding the dispatcher state, such as:

  • https://github.com/Azure/azure-functions-host/issues/7590
  • https://github.com/Azure/azure-functions-host/issues/7463

Right now, the function dispatcher buffers invocation requests until a language worker process starts and initializes. As I was investigating CRIs around the Did not find any initialized language workers error, it seemed like we might not be properly resetting the state of the dispatcher when worker issues arise.

We need to investigate to ensure that the dispatcher state is not set to initialized in case of language worker process startup errors, and that the dispatcher state is being reset to initialized again only after at least one worker channel is ready to accept invocations. This sort of ties into https://github.com/Azure/azure-functions-host/issues/7462

By dispatcher I am referring to RpcFunctionInvocationDispatcher

liliankasem avatar Mar 25 '22 17:03 liliankasem

I also took a look at @yojagad's PR for #7590, and there was mention of:

"the Dispatcher can still be set to initialized state even if WebhostLanguageWorkerChannel does not exist thus leading to requests attempting to be routed to non-existing language workers. This means there will still be at least one request routed before shutting down the host."

Which explains the behaviour I described in this comment https://github.com/Azure/azure-functions-host/issues/7462#issuecomment-1078466160

I wonder if this investigation's purpose was to look into that and see if we can do anything about it to avoid any invocations being processed if there isn't a worker process.

liliankasem avatar Mar 25 '22 17:03 liliankasem

Investigation for this is still in progress. Moving to sprint 124.

fabiocav avatar Jun 08 '22 20:06 fabiocav

Investigation is still in progress. Moving to 128

fabiocav avatar Aug 03 '22 20:08 fabiocav

Deprioritizing this work.

fabiocav avatar Sep 14 '22 20:09 fabiocav

As I was investigating CRIs around the Did not find any initialized language workers error, it seemed like we might not be properly resetting the state of the dispatcher when worker issues arise.

I believe I am having an issue related to this when a dotnet-isolated worker function times out. The function worker is shutdown but any subsequent function timer triggers fail to run because the host is not initializing new function workers.

In this screenshot, we see the red function get terminated due to a timeout and the worker is shutdown. The red function runs at the top of the hour, it was not triggered again at 11:00 pm. I'm guessing because there was a singleton lock within the host to indicate another function was running. And a secondary function at 11:05 pm, 12:05 am, and so on, failed to run because of the worker error. A restart of the function app clears the issue until a timeout occurs again.

I am working to resolve the timeout issue internally, but the stuck state of the worker is an issue for reliability.

image

image

Stack traces from GetLanguageWorker:

[
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 0,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext",
        "level": 1,
        "line": 352,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionExecutor.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 2,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 3,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 4,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<TryExecuteAsync>d__18.MoveNext",
        "level": 5,
        "line": 108,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionExecutor.cs"
    }
]
[
    {
        "assembly": "Microsoft.Azure.WebJobs.Script, Version=4.14.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcherLoadBalancer.GetLanguageWorkerChannel",
        "level": 0,
        "line": 27,
        "fileName": "/src/azure-functions-host/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcherLoadBalancer.cs"
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Script, Version=4.14.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher+<InvokeAsync>d__62.MoveNext",
        "level": 1,
        "line": 466,
        "fileName": "/src/azure-functions-host/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 2,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 3,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 4,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Script, Version=4.14.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Microsoft.Azure.WebJobs.Script.Description.WorkerFunctionInvoker+<InvokeCore>d__9.MoveNext",
        "level": 5,
        "line": 95,
        "fileName": "/src/azure-functions-host/src/WebJobs.Script/Description/Workers/WorkerFunctionInvoker.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 6,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 7,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 8,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Script, Version=4.14.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Microsoft.Azure.WebJobs.Script.Description.FunctionInvokerBase+<Invoke>d__24.MoveNext",
        "level": 9,
        "line": 82,
        "fileName": "/src/azure-functions-host/src/WebJobs.Script/Description/FunctionInvokerBase.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 10,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 11,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 12,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.VoidTaskMethodInvoker`2+<InvokeAsync>d__2.MoveNext",
        "level": 13,
        "line": 20,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\VoidTaskMethodInvoker.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 14,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 15,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 16,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2+<InvokeAsync>d__10.MoveNext",
        "level": 17,
        "line": 52,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionInvoker.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 18,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 19,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 20,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<InvokeWithTimeoutAsync>d__33.MoveNext",
        "level": 21,
        "line": 581,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionExecutor.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 22,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 23,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 24,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithWatchersAsync>d__32.MoveNext",
        "level": 25,
        "line": 527,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionExecutor.cs"
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 26,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
        "level": 27,
        "line": 0
    },
    {
        "assembly": "System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 28,
        "line": 0
    },
    {
        "assembly": "Microsoft.Azure.WebJobs.Host, Version=3.0.34.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
        "method": "Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext",
        "level": 29,
        "line": 306,
        "fileName": "C:\\projects\\azure-webjobs-sdk-rqm4t\\src\\Microsoft.Azure.WebJobs.Host\\Executors\\FunctionExecutor.cs"
    }
]

nerddtvg avatar Dec 02 '22 14:12 nerddtvg