grpc-kotlin
grpc-kotlin copied to clipboard
Expected one request for MethodDescriptor{DESCRIPTION_OF_THE_ENDPOINT} but received none
Hi,
We currently have a very strange issue in a microservice with grpc-kotlin.
Issue description:
- after several days of running without issues some grpc endpoints of the service start to have a low but significant error rate (~6%).
- The error we get:
Expected one request for MethodDescriptor{DESCRIPTION_OF_THE_ENDPOINT} but received none
(coming from stub/src/main/java/io/grpc/kotlin/Helpers.kt:74 in the grpc-kotlin)
- restart of all service pods "fixes" the issue… until it occurs again after a couple of days
- memory/cpu usage don't look suspicious
- We are using unary calls, i.e. we use the
io.grpc.kotlin.ServerCalls#unaryServerMethodDefinition
to implement our endpoints
Currently we don't have a clear idea what might be causing these issues. Did you see a similar issue before somewhere? Or do you have an idea what might be causing this?
Thanks, Marvin
I wonder if we could reproduce this by setting up a load test. It'd be nice to have some automated tests like this.
We are currently extending our load test to see if we can trigger the issue. Will post here if I get further insights into the issue.
We are seeing something similar, using 1.2.1
We have tried playing with memory and CPU as well, no difference.
We finally did some load tests and could narrow it down a bit further:
- the specific error message is related to a timeout on the grpc client (!)
stub.withDeadlineAfter(...
that we use to make the requests. If we increase that timeout, this specific error message (on server side!) disappears. - However, request cancellation by the client alone is not enough for the error to occur, we also need to have a high resource usage on server side, e.g. high cpu load. We are currently trying to figure out what exactly is limiting here.
- (edit:) It seems like the error occurs when the client already cancelled the requests, before the server even tries to call the
singleOrStatusFlow
from Helpers.kt. However, we sometimes get this error after only very short times (very short trace durations), so not too sure about it...
Do you have an idea what exactly might be going on and what we could test to verify?
We ended up finding the issue for us, our service accidentally got moved onto a bottlerocket host instance on AWS. Not sure on the details of why it caused problems but it certainly fixed it.