bosh
bosh copied to clipboard
`/metrics` and `/api_metrics` endpoint does not show the generic API metrics for the director's endpoints
Describe the bug
According to documentation the /metrics
and /api_metrics
endpoint will expose the generic API metrics for the director's endpoints including number of requests and response time. But when we call those endpoints:
-
/metrics
only exposes the metrics for/metrics
and/api_metrics
endpoints -
/api_metrics
only return OK
To Reproduce
- Deploy a bosh director
- curl some endpoints of bosh, e.g.
vms
,deployments
, etc. - curl http://<bosh_ip>:9092/metrics
- curl curl http://<bosh_ip>:9092/api_metrics
Expected behavior The endpoint should return metrics of bosh endpoints.
Hi, as show case I implemented a tiny puma and prometheus-client integration which serves the expected webserver access metrics https://github.com/mvach/PumaMetricsExample.
Sadly I don't see the difference to the current director implementation right now.
I'm not sure how the generic API metrics are supposed to work at all because:
- As defined in the director job the metrics-server is started as a different process. You can check this also on a director VM with
netstat -tulpn | grep 9092
&ps -aux | grep <pid-from-previous-command>
. - bosh-director-metrics-server starts and registers the Prometheus collector for itself
- the metrics_collector collects only the bosh metrics and no generic API metrics.
You see only metrics for the /metrics
endpoint because this is the only endpoint you call on the metrics server. Maybe I miss something here but this is my current understanding.
:-) @beyhan, I just noticed that right now and wanted to update the issue.
I was able to get api metrics from /api_metrics
. I did have to enable the metrics and I think I got the OK
response when the metrics were NOT enabled, so that might have been part of the problem.
The /api_metrics
endpoint does map to director web process, so in theory it would have access to this data. However I'm not sure how accurate the data is. The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.
Good catch @jpalermo. This is the commit which introduced the change. It looks like initially the /api_metrics
were called /director_metrics
but yes they redirect to the director process which I missed.
The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.
This is a good question. It looks to me that they should be accurate because the director internal metrics are gathered from the DB data, but you "never know" :-)