azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

PowerShell worker crashes with intermittent Grpc.Core.RpcException: Status(StatusCode=Unknown, Detail="Stream removed") exception

Open michaelpeng36 opened this issue 2 years ago • 5 comments

Investigative information

Please provide the following:

  • Timestamp: 2023-01-10, 00:56:36.005 UTC
  • Function App version: 4.14.0.19631
  • Function App name: atsushin20220927func
  • Function name(s) (as appropriate): N/A (the sole function on the app is a QueueTrigger function called ZoomMeetingReport, but invocations themselves do not fail).
  • Invocation ID: N/A. The exception occurs between function invocations
  • Region: Japan East

Repro steps

No specific repro steps. Queue items are added by a basic ConsoleApp. The issue occurs intermittently, over the span of days.

Expected behavior

The expected behavior is that gRPC stream between the worker and the host is not suddenly removed, at least without graceful worker shutdown. Kusto logs suggest that the stream is not removed by the worker:

   at Grpc.Core.Internal.ClientResponseStream`2.MoveNext(CancellationToken token)
   at Microsoft.Azure.Functions.PowerShellWorker.Messaging.MessagingStream.MoveNext() in /mnt/vss/_work/1/s/src/Messaging/MessagingStream.cs:line 42
   at Microsoft.Azure.Functions.PowerShellWorker.RequestProcessor.ProcessRequestLoop() in /mnt/vss/_work/1/s/src/RequestProcessor.cs:line 75
   at Microsoft.Azure.Functions.PowerShellWorker.Worker.Main(String[] args) in /mnt/vss/_work/1/s/src/Worker.cs:line 57
   at Microsoft.Azure.Functions.PowerShellWorker.Worker.&lt;Main&gt;(String[] args)</Data></EventData></Event>

Actual behavior

Intermittent worker crashes in the middle of the request-processing loop.

Known workarounds

The issue resolves itself after some time.

Related information

Provide any related information

  • Programming language used: PowerShell 7.2
  • Links to source: source code attached below
  • Bindings used: QueueTrigger
run.ps1
param([string] $QueueItem, $TriggerMetadata)
$ErrorActionPreference = "Stop"

Write-Host "Succeeded"
function.json
{
  "bindings": [
    {
      "name": "QueueItem",
      "type": "queueTrigger",
      "direction": "in",
      "queueName": "ps-zoom-meetings-queue-items",
      "connection": "AzureWebJobsStorage"
    }
  ],
  "retry": {
    "strategy": "fixedDelay",
    "maxRetryCount": 0,
    "delayInterval": "00:00:10"
  }
}

michaelpeng36 avatar Jan 13 '23 23:01 michaelpeng36

@michaelpeng36 assigning this for initial investigation, but curious to know if you have an app you can use to repro this. If so, we can have some additional instrumentation added to see if that helps us identify the root cause.

fabiocav avatar Feb 08 '23 21:02 fabiocav

Thanks for the response, @fabiocav. Yes, we have a repro app set up for this. I will share the details privately.

michaelpeng36 avatar Feb 08 '23 22:02 michaelpeng36

@brettsam / @michaelpeng36 have you had a chance to sync on this? I'll move this to sprint 141, but please update/close if there is more information about this issue.

fabiocav avatar Mar 01 '23 21:03 fabiocav

Pushing to sprint 143, but @brettsam and @michaelpeng36 are actively looking at this.

fabiocav avatar Mar 15 '23 20:03 fabiocav

Leaving this open for validation. Let's run some queries to see if this issue still persists.

fabiocav avatar Apr 30 '25 20:04 fabiocav