LitServe icon indicating copy to clipboard operation
LitServe copied to clipboard

First attempt at monitoring metrics using prometheus_client

Open AdolfoVillalobos opened this issue 5 months ago • 15 comments

Before submitting
  • [x] Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • [ ] Did you read the contributor guideline, Pull Request section?
  • [ ] Did you make sure to update the docs?
  • [ ] Did you write any new necessary tests?

⚠️ How does this PR impact the user? ⚠️

Describe (in plain English, not technical Jargon) how this improves the user experience. If you can't tie it back to a real tangible, user goal or describe it in plain english, it's a hint that this is likely not needed and is probably an "engineering nit".

GOOD:

"As a user, I need to serve models faster. This PR focuses on enabling speed gains by using GPUs"

BAD:

"This PR enables GPUs". This is bad because the user problem is not clear... instead it just jumps to the solution without any context.

PRs without this will not be merged.


What does this PR do?

First attempt at solving #146

PR review

This implementation introduces basic monitoring metrics using prometheus_client to collect API metrics. The goal is to provide a foundation for further enhancements and discussions.

Implementing monitoring involves some decision-making and tradeoffs, so I start by:

  1. proposing changes.
  2. Highlight some limitations that I didn't want to solve right away before getting feedback
  3. Summarize the key issues to discuss.

I hope this PR proves to be a useful ground for discussion :)

Proposed Changes

1. Prometheus Integration:

  1. Utilizes prometheus_client to collect metrics.
  2. Integrates seamlessly with FastAPI, automatically creating a /metrics endpoint.
  3. Compatible with multithreading environments.

2. Implementation Approach:

  1. The current approach involves some repetition. A more elegant solution would be to use decorators for methods like predict, encode, and decode.
  2. Care must be taken with callback annotations and signatures to maintain API flexibility, especially in a multithreading context.

3. Memory Management:

Given the multithreading environment, memory management for the Prometheus client may be necessary. This is an area that requires further discussion to determine if it's something the team wants to address now.

Limitations

1. Metric Scope:

  1. Only prediction elapsed time is measured currently.
  2. Generalization to other metrics and custom metrics is straightforward but not yet implemented.

2. Code Repetition:

  1. The current implementation is somewhat repetitive.
  2. Future work should focus on reducing this through the use of decorators.

3. Memory Management:

  1. Potential memory management issues with Prometheus in a multithreading environment need to be considered.

Next Steps (Summary)

  1. Discuss and decide on the best approach for extending metrics. Should we use prometheus?
  2. Evaluate the use of decorators to reduce code repetition. This might involve some refactoring.
  3. Consider the implications of memory management for Prometheus in our multithreading setup. Maybe a simpler solution would be best?

Did you have fun?

Yess. I've been playing with this issue for a couple of hours and I think I have a better grasp of the project.

AdolfoVillalobos avatar Sep 02 '24 07:09 AdolfoVillalobos