client_python icon indicating copy to clipboard operation
client_python copied to clipboard

Add public, read-only access to label names, label values of any metric

Open Tomasz-Kluczkowski opened this issue 2 years ago • 3 comments

Hi,

I am currently working on Prometheus metrics for Celery workers in Flower project and we use this wonderful library :) - thx for your work on it!

One issue I came across is lack of direct access to metric's labelnames and labelvalues.

I think easiest is to give you an example how it is used/why it is needed.

We have multiple celery workers in a kubernetes cluster, they are monitored by Flower which generates Prometheus metrics. Now the pods in k8s can die or be replaced if a new release of our software is deployed.

The problem with that then is that the final reading for each metric for that pod is retained forever and it shows in grafana forever.

We want to remove any tuples of label values which contain a celery worker name that is deemed offline/dead.

I used a semi-hack (https://github.com/mher/flower/pull/1135) and get those label values through metric.collect(), and from that iterate over samples and the labels in them and then if they contain an offline worker I remove the whole tuple from a given metric by calling metric.remove(*labelvalues_containg_dead_worker)

There could be a better way - access metric._metric.keys() in a thread safe and read-only way.

So I propose to add to MetricWrapperBase a read only property, say all_label_values (name to be improved, suggestions welcome :))

@property
def all_label_values(self) -> List[Tuple[str, ...]]:
    with self._lock:
        all_label_values = list(self._metric.keys().copy())

   return all_label_values

I think I am also badly missing access to actual labelnames set for the given metric to know at which position a label with a given name will be in the labelvalues.

Could kill 2 birds with one stone then and add:

@property
def labelnames(self) -> Tuple[str, ...]:
    return self._labelnames

Please let me know if such a change is acceptable and if I am missing any nuances.

If you think you would be ok merging it I will make a PR with tests soon.

Cheers,

Tom

Tomasz-Kluczkowski avatar Aug 19 '21 20:08 Tomasz-Kluczkowski

Hmm, I will think about this some. Typically the best practice for exposing metrics like this that may disappear/are from some other system (celery in this case) is to use a custom collector and generate the metrics on each scrape.

csmarchbanks avatar Aug 30 '21 14:08 csmarchbanks

Ok I think I understand but want to confirm.

The logic for deciding if a metric should be produced is then moved inside of the custom collector, correct?

So in our celery example it would mean we check if a worker is offline and if it is we do not produce certain metrics for it rather than removing metrics.

I assume then that each of the multiple metrics that we have has to be turned into a custom collector and follow the rule to not produce a metric for offline worker?

Cheers

On 30 Aug 2021, at 15:30, Chris Marchbanks @.***> wrote:



Hmm, I will think about this some. Typically the best practice for exposing metrics like this that may disappear/are from some other system (celery in this case) is to use a custom collector and generate the metrics on each scrape.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/prometheus/client_python/issues/689#issuecomment-908392485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGGVI2MVZRBQ55YRGY2WXSLT7OIYPANCNFSM5CO73U7Q.

Tomasz-Kluczkowski avatar Aug 30 '21 16:08 Tomasz-Kluczkowski

Correct, you can see the example in the README: https://github.com/prometheus/client_python/#custom-collectors and add a loop over workers inside collect to only call add_metric(...) for the active ones.

csmarchbanks avatar Aug 31 '21 03:08 csmarchbanks