BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

feature: move model name & bentoml from prefix to label in metrics

Open creativedutchmen opened this issue 1 year ago • 5 comments

Feature request

Currently the metrics in the /metrics endpoint looks like: BENTOML_iris_classifier_request_duration_seconds_sum{endpoint="/",http_response_code="200",service_version="xwr3c7rjzgmb2lrj"} So with the BENTOML and model name prefix. I would like the structure to look like this: request_duration_seconds_sum{endpoint="/",http_response_code="200",app_name="bentoml",model_name="iris_classifier",service_version="xwr3c7rjzgmb2lrj"}.

Motivation

This would allow two things in Grafana:

  1. Reuse of dashboards - a simple variable for the model name would be enough to create a dashboard for a new model, without changing any query or panel.
  2. Create an "all models" dashboard, highlighting the total predictions/s, latencies, etc. Very helpful when you need to see the health of your ML system in a glance, without looking at many different dashboards.

Other

I could send a PR

creativedutchmen avatar Sep 01 '22 20:09 creativedutchmen

Thanks for the suggestion @creativedutchmen, I really like this approach.

@ssheng @bojiang, what do you think? I believe this will simplify how users can create Grafana dashboards for Yatai deployment as well. My only concern is that this will be a breaking change and may break users' existing dashboards. We may want to figure out a way to be backward compatible or provide documentation on how to migrate.

parano avatar Sep 01 '22 20:09 parano

@creativedutchmen thanks for the suggestion. This came in at the perfect timing. We encountered the same problem creating dashboard for Yatai. The team discussed today and decided to emit two sets of the metrics to solve the problem with dashboards and maintain backward compatibility.

Legacy metrics:

BENTOML_iris_classifier_request_total
BENTOML_iris_classifier_request_duration_seconds_sum
BENTOML_iris_classifier_request_duration_seconds_bucket
BENTOML_iris_classifier_request_duration_seconds_count

New metrics:

bentoml_request_total
bentoml_request_duration_seconds_sum
bentoml_request_duration_seconds_bucket
bentoml_request_duration_seconds_count

Both sets of the metrics will be supported until we can phase out the legacy one.

ssheng avatar Sep 02 '22 04:09 ssheng

Wow that is quick, thanks! If I can do anything to help let me know :)

creativedutchmen avatar Sep 02 '22 06:09 creativedutchmen

@creativedutchmen Thanks for offering to help. Would be great to have your review on https://github.com/bentoml/BentoML/pull/2969.

ssheng avatar Sep 06 '22 07:09 ssheng

The naming looks great, exactly what I would have expected :)

creativedutchmen avatar Sep 07 '22 14:09 creativedutchmen