liftbridge
liftbridge copied to clipboard
Monitoring API
Provide an API that exposes monitoring information and metrics.
We'll need to think on whether this should be part of the gRPC API or a separate HTTP/REST-based API. My inclination is that HTTP is nicer for implementing integrations and can be hit directly from a web browser, curl, etc. for debugging purposes. The downside is it will require running an HTTP server on an additional port.
Just want to add that GRPC-Web can be used and so can be hit by a web browser. You can avoid taking on envoy proxy as a runtime dependency and just embed envoy.
https://www.getenvoy.io/
Example: https://github.com/pomerium/pomerium/blob/master/scripts/embed-envoy.bash
Please export the metrics at /metrics
in Prometheus format. That would be the best.
Hi! Just doing a PoC in a big mesh and i suffer from lack of /metrics
looking forward to this one too :)
definitely a requirement for us to use in production
I plan to tackle this once consumer groups is completed. The plan at this time is to implement a /metrics
endpoint in Prom format.
I suggest we make proposition here on what metrics should be exposed ? I think it would be nice to have an exhaustive v0 of metrics that are judged to be critical. Any ideas ?
I also think, there are 2 kind of metrics:
- Separate server metrics: things like CPU, RAM... etc of each server
- Network/Mesh metrics: things related to the mesh of cluster itself. E.g: Raft, metadata ...etc
For a start, may be it is somehow relevant, here are the list of metrics exposed by the famous Hashicorp Nomad
There are probably 3 categories of metrics:
- Low-level server metrics (CPU, RAM, etc.)
- Control-plane metrics (Raft information, low-level partition and clustering metrics such as leader information, follower last contact, etc.)
- Higher-level control plane and data plane metrics (consumer group information, partition message rates, etc.)
There may be others that I am missing, but this is what comes to mind for me initially. To your point, the first step should probably be determining what the minimal critical set of metrics are, then add additional ones once there is an identified need. I would prefer to start small and then build on it.
I suggest to even start smaller.
We can define already the code pattern that shall be used to collect and export metrics.
Each and every metrics that will be added later are basically adapters to add. And they can be added progressively and independently.
This can be processed in parallel with the discussion on the metrics. Or we can rather pick 1-2 metrics in a very arbitraged way to begin with.
I would really appreciate ways to calculate produce/consume rates, and even more so consumer lag (time)
Hi! Consumer groups are good, so.. :)
I just want to add some values to export in metrics such as HW, Last Offset and Cursor counts. It really helps in investigation of some processes