incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Bug] The metric `grpc_open` sometime incorrect

Open xianjingfeng opened this issue 3 years ago • 4 comments

We found the value of grpc_open sometime very big(>1000) even no application run in our cluster

xianjingfeng avatar Jul 26 '22 10:07 xianjingfeng

Could you provide more detailed information? Could you add some logs to help us solve this problem?

jerqi avatar Jul 27 '22 04:07 jerqi

No logs, we just found this phenomenon. Maybe org.apache.uniffle.common.rpc.MonitoringServerCall#close not called sometimes. I try to call decCounter in MonitoringServerCallListener#onComplete/onCancel/onComplete and it work. But i don't know the real reasion

xianjingfeng avatar Jul 27 '22 12:07 xianjingfeng

No logs, we just found this phenomenon. Maybe org.apache.uniffle.common.rpc.MonitoringServerCall#close not called sometimes. I try to call decCounter in MonitoringServerCallListener#onComplete/onCancel/onComplete and it work. But i don't know the real reasion

I guess that method close could not be called sometimes if there is an exception.

jerqi avatar Jul 27 '22 12:07 jerqi

cc @colinmjj , Do you remember our flaky metric test? I guess that it's caused by this issue.

jerqi avatar Jul 27 '22 15:07 jerqi