grpc-spring icon indicating copy to clipboard operation
grpc-spring copied to clipboard

Server metrics are tagged with the wrong status codes

Open harti2006 opened this issue 2 years ago • 13 comments

The context

I'd like to get the metric values for grpc.server.processing.duration and distinguish different status codes, e.g. OK, NOT_FOUND, INVALID_ARGUMENT, UNKNOWN

The bug

If my gRPC service is called three times, and it responds twice with an OK status and once with with an NOT_FOUND status, I'd expect the tags that are captured for the metrics grpc.server.processing.duration to be "OK" and "NOT_FOUND". But instead, the "NOT_FOUND" error seems to be reported as an "UNKNOWN" error.

Stacktrace and logs

I don't know

Steps to Reproduce

I created a very simple example project, with one gRPC service and one GrpcAdvice that translates my "business" exception into a NOT_FOUND status.

Please check out the integration test, that shows how to reproduce the issue: https://github.com/harti2006/grpc-metrics-demo/blob/trunk/src/test/java/com/github/harti2006/grpc/metrics/GrpcMetricsDemoApplicationTests.java

The application's environment

Which versions do you use?

  • Spring (boot): 2.7.5
  • grpc-java: 1.51.0
  • grpc-spring-boot-starter: 2.14.0.RELEASE
  • java: 11

Additional context

  • It used to work fine with the previous release 2.13.1.RELEASE. When downgrading to this version, the above mentioned integration test works fine

harti2006 avatar Nov 24 '22 15:11 harti2006

Which version of micrometer are you using? Because we removed our implementation if favor of the official/upstream one.

ST-DDT avatar Nov 24 '22 18:11 ST-DDT

Hey, I'm using the version of micrometer, that comes with Spring Boot 2.7.5. It must be related to the recent changes, that you mentioned, because it works as expected in the previous version of your library. I could really use a little help with debugging this, because it could either be a bug in the upstream lib, or maybe the way how the GrpcAdvice integrates and translates errors to status codes. But this is currently just a guess.

ST-DDT @.***> schrieb am Do., 24. Nov. 2022, 19:27:

Which version of micrometer are you using? Because we removed our implementation if favor of the official/upstream one.

— Reply to this email directly, view it on GitHub https://github.com/yidongnan/grpc-spring-boot-starter/issues/789#issuecomment-1326759656, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU3NBDEAIVR4BQ4Z72OTBDWJ6XQZANCNFSM6AAAAAASKRDJRU . You are receiving this because you authored the thread.Message ID: @.***>

harti2006 avatar Nov 24 '22 18:11 harti2006

Have a look here: I assume these should be the locations you are looking for:

ST-DDT avatar Nov 24 '22 18:11 ST-DDT

Yeah, I found these places while debugging the problem. And I saw that funny behavior: The responseCode is only updated in the close method, but that method was only called when the grpc handler method completed successfully. On errors, the close method wasn't called at all, so the default value "UNKNOWN" was reported. Tomorrow, I will be able to add another case to my example to verify if this happens only when using the GrpcAdvice approach of sending errors, or if it's a general problem.

harti2006 avatar Nov 24 '22 20:11 harti2006

I don't have much time to look into this right now, I will try to find some this weekend.

Could you try to change the order of the interceptors:

https://github.com/yidongnan/grpc-spring-boot-starter/blob/3f8f27506542a954a05be71a904cab779118005b/grpc-server-spring-boot-autoconfigure/src/main/java/net/devh/boot/grpc/server/autoconfigure/GrpcServerMetricAutoConfiguration.java#L63-L75

Copy this to your code and change the interceptor order to -1000 (smaller than the grpc advice one)

https://github.com/yidongnan/grpc-spring-boot-starter/blob/3f8f27506542a954a05be71a904cab779118005b/grpc-common-spring-boot/src/main/java/net/devh/boot/grpc/common/util/InterceptorOrder.java#L46-L53

ST-DDT avatar Nov 24 '22 23:11 ST-DDT

😭 The current grpc advice implementation seems to be broken and might have to be reimplemented. For unary calls it seems to work to a certain extend, but causes an undefined state in the call itself. For unary calls it doesn't work (anymore) as it tries to close the call twice, and logs errors.

ST-DDT avatar Nov 25 '22 01:11 ST-DDT

Thanks for investigating. As promised I added another case to the example project where I send the exception directly via streamObserver:

responseObserver.onError(Status.NOT_FOUND.withDescription("goodbye message not found").asException());

I did the same test as before and with this approach, the metrics were reported correctly. See https://github.com/harti2006/grpc-metrics-demo/blob/933c29641a3c813db649cdd275ba022e08637820/src/test/java/com/github/harti2006/grpc/metrics/GrpcMetricsDemoApplicationTests.java#L72

I also tried your suggestion with changing the order of the MetricCollectingServerInterceptor bean, but it made things even worse: No metrics were reported at all.

harti2006 avatar Nov 25 '22 06:11 harti2006

Hey, i have the very same issue. Did u managed to fix that problem? I am afraid i will need to downgrade this lib to 2.13.1. Do u need any help in investigation?

pawelKapl avatar Jul 17 '23 11:07 pawelKapl

A up/downgrade of this library should not have an impact on that as the official metric implementation is the exact same as this one's AFAIK, and I PRed it there.

ST-DDT avatar Jul 17 '23 11:07 ST-DDT

Hmm OP mentioned that downgrade to 2.13.1 is resolving this issue. What do u suggest then?

pawelKapl avatar Jul 17 '23 11:07 pawelKapl

I think it doesnt hurt trying. If it does solve the problem, it would be nice if you could debug into it and compare what is different.

ST-DDT avatar Jul 17 '23 12:07 ST-DDT

@ST-DDT going back to 2.13.1 helped (SpringBoot 2.7.12 here). In a free time will take a look at both versions and try to find issue there.

pawelKapl avatar Jul 18 '23 11:07 pawelKapl

Any way of using @GrpcAdvice with Spring Boot 3.X? responseObserver.onError( is not working for us.

joaomper-TE avatar Aug 06 '24 15:08 joaomper-TE