extensions icon indicating copy to clipboard operation
extensions copied to clipboard

HTTP resiliency features don't work with the .NET gRPC client

Open DamianEdwards opened this issue 1 year ago • 7 comments

The HTTP resiliency features, including those added by the IHttpClientBuilder.AddStandardResilienceHandler method, don't apply to gRPC calls despite them going through configured HttpClient instances. This is due to the gRPC stack not exposing error details at the HTTP request level in the way that the resiliency features expect (e.g. using HTTP status codes).

The following code example, typical of setting up a gRPC client in a .NET server application, will not actually result in the standard resiliency features being applied to gRPC calls:

builder.Services.AddGrpcClient<Basket.BasketClient>(o => o.Address = new("http://basket-api"))
    .AddStandardResilienceHandler();

Consider adding support for the standard resiliency patterns to the .NET gRPC client stack in a similar fashion to those added to the HttpClient stack so that resiliency features like Circuit Breaker can be easily added by default.

/Cc @JamesNK @davidfowl

DamianEdwards avatar Feb 06 '24 06:02 DamianEdwards

cc: @martintmk @martincostello @geeknoid

joperezr avatar Feb 06 '24 21:02 joperezr

The easy enhancement is to improve the HttpClientResiliencePredicates to also detect gRPC calls and handle retriable status codes:

https://github.com/dotnet/extensions/blob/80abb8ddf7a2454930ae2378b121f044fe3df848/src/Libraries/Microsoft.Extensions.Http.Resilience/Polly/HttpClientResiliencePredicates.cs#L46

This should make both retry and circuit breaker strategy work for gRPC. The other issue is handling of streamed calls, which I am not sure how to address.

martintmk avatar Feb 07 '24 08:02 martintmk

gRPC always return 200 status code. Failure is communicated in grpc-status trailer.

I haven't looked at how resilience works, but I'm guessing the retry happens inside a HTTP handler's SendAsync. gRPC supports streaming an error can occur long after response status is returned and SendAsync has run.

I think a known limitation will be that streaming gRPC calls won't be retried. However, failing unary calls should be detectable. Look for a 200 status code and also check the response headers for grpc-status. They will both be available in SendAsync.

JamesNK avatar Feb 07 '24 08:02 JamesNK

Failure is communicated in grpc-status trailer.

The trailer is available only after the response body is finished reading, is that correct? I am wondering how we can ensure that trailer is available for gRPC calls. Otherwise, the retries won't work.

Will buffering the content work?

martintmk avatar Feb 07 '24 08:02 martintmk

Will buffering the content work?

No.

If an error happens before any content is returned by the server, then grpc-status is in the headers. That is the scenario that will work. It's confusingly named Trailers-Only in the spec - https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#responses

JamesNK avatar Feb 07 '24 08:02 JamesNK

ITNOA

Any plan to implement specific extensions for support gRPC in Microsoft.Extensions.Resilience?

thanks

soroshsabz avatar Apr 02 '24 18:04 soroshsabz

Any plan to implement specific extensions for support gRPC in Microsoft.Extensions.Resilience?

That is what this issue is tracking. No committed timelines yet, so for now we just want to continue this discussion.

joperezr avatar Apr 02 '24 18:04 joperezr

I have some questions:

  1. Does this mean that Microsoft.Extensions.Http.Resilience is not compatible with .NET 8 Azure Functions running in isolated worker mode?

To run on Functions, gRPC services will be required today.

  1. If so, is this a limitation in Polly.Extensions or just in Microsoft.Extensions.Http.Resilience?
  2. Are there any workarounds for people building .NET 8 Azure Functions running in isolated worker mode?

Thanks!

acatuttle avatar Jun 04 '25 06:06 acatuttle