penumbra icon indicating copy to clipboard operation
penumbra copied to clipboard

GRPC metrics middleware implemented as a Tower Layer

Open hdevalence opened this issue 11 months ago • 1 comments

Is your feature request related to a problem? Please describe.

Recently, the public RPC fell into a degraded state as someone made too many requests to it. In this case, we identified the cause via an accidental backchannel. However, had that not happened we would have been totally unable to determine what the cause was, what kind of requests we were getting, and what was happening to them, because we only have metrics on the requests we already had a reason to care about.

Instead, we need to have generic GRPC metrics that work with any GRPC method.

Describe the solution you'd like

Scope and implement a tower Layer that we can apply to our GRPC services. The trace middleware is probably a good reference implementation to study.

We should identify a few relevant metrics and then emit them with the GRPC method name as a metrics key. That would allow us to take cross-sections of each metric by rpc method and identify performance culprits.

Suggestions to get started:

  • Request count (allows computing rates)
  • Request latency (will require inspecting the request, this is ~easy using Tower)

Ideas for later:

  • Some way to attribute load to the request
  • Bandwidth (related to above)

We should implement this specifically as a Tower Layer rather than spending time adding additional specific metrics; it will be a bit more work upfront but will have much better results long term.

hdevalence avatar Mar 04 '24 06:03 hdevalence

Haven't made progress on this lately, but did sit down with @cratelyn a few weeks ago for a pairing session, and documented the state of play in a branch: https://github.com/penumbra-zone/penumbra/tree/tonic-metrics-spike I still view this work as must-have, but in the immediate near-term, I'm going to prioritize testing (https://github.com/penumbra-zone/penumbra/issues/4323) and release (https://github.com/penumbra-zone/penumbra/issues/4325).

conorsch avatar May 06 '24 20:05 conorsch