trino-gateway icon indicating copy to clipboard operation
trino-gateway copied to clipboard

Trino Gateway metrics from /metrics endpoint fail parsing by openmetrics client

Open raj-manvar opened this issue 11 months ago • 0 comments

The metrics emitted by Trino Gateway from the /metrics endpoint as described at https://trinodb.github.io/trino-gateway/operation/#monitoring fails to be parsed by openmetrics client. Reproducing with running the Trino gateway locally with the below snippet.

>>> import requests
>>> from prometheus_client import generate_latest, CollectorRegistry
>>> from prometheus_client.openmetrics import parser
>>> 
>>> # Fetch metrics from the /metrics endpoint
>>> response = requests.get("http://localhost:8080/metrics")
>>> response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
>>> metrics_data = response.text
>>> 
>>> # Parse the fetched metrics data
>>> parsed_data = parser.text_string_to_metric_families(metrics_data)
>>> 
>>> # Process the parsed data
>>> for family in parsed_data:
...     print(f"Metric Family: {family.name}")
...     print(f"  Type: {family.type}")
...     for sample in family.samples:
...         print(f"  Sample: {sample.name}{{{sample.labels}}}, Value: {sample.value}")
... 
Metric Family: io_trino_gateway_ha_handler_name_ProxyHandlerStats_RequestCount
  Type: counter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rmanvar/indeed/datalake-dfr-scripts/python/dfrvirtualenv/lib/python3.10/site-packages/prometheus_client/openmetrics/parser.py", line 18, in text_string_to_metric_families
    yield from text_fd_to_metric_families(StringIO.StringIO(text))
  File "/home/rmanvar/indeed/datalake-dfr-scripts/python/dfrvirtualenv/lib/python3.10/site-packages/prometheus_client/openmetrics/parser.py", line 509, in text_fd_to_metric_families
    yield build_metric(name, documentation, typ, unit, samples)
  File "/home/rmanvar/indeed/datalake-dfr-scripts/python/dfrvirtualenv/lib/python3.10/site-packages/prometheus_client/openmetrics/parser.py", line 472, in build_metric
    raise ValueError("Clashing name: " + name + suffix)
ValueError: Clashing name: io_trino_gateway_ha_handler_name_ProxyHandlerStats_RequestCount
>>> 

The returned metrics fail with ValueError: Clashing name.

This makes the metrics returned from /metrics endpoint not consumable for other sources like datadog

from https://github.com/prometheus/OpenMetrics/blob/main/specification/OpenMetrics.md#counter-1

The MetricPoint's Total Value Sample MetricName MUST have the suffix "_total". If present the MetricPoint's Created Value Sample MetricName MUST have the suffix "_created".

This seems to be the https://github.com/airlift/airlift/blob/4414684a9324adb2515fe9d708c6f4df7d2fe808/openmetrics/src/main/java/io/airlift/openmetrics/types/Counter.java#L38-L42 place where the /metric content is formed which doesn't follow the openmetrics format

raj-manvar avatar May 13 '25 21:05 raj-manvar