[Improvement] Suggestion for Handling Stale Metrics in Gravitino
What would you like to be improved?
Issue Description There is a current limitation in the gravitino_server_http_request_duration_seconds{operation="xxx", quantile="0.xx"} metrics within Gravitino. If an operation is not called for a prolonged period after its last execution, the metric continues to display the duration of the last call. This behavior can lead to misinterpretations of the system's performance, as it does not accurately reflect the inactivity.
Suggested Improvement It would be beneficial to modify the metric behavior to either reset to zero or drop the metric data when no calls are made to an operation within a predefined timeout period. This change would provide a more accurate representation of the system's current state and activity levels.
@FANNG1 +cc
How should we improve?
No response
thanks for reporting this issue.
@TEOTEO520 , #3341 propose a way to compuate Pxx according to the time sliding window, you could check it when you are free. I'm not sure whether to make it configurable for now.