metrics: Database-Level Performance
Database Performance Metrics
Right now, we serve a good number of metrics for understanding writes and queries. But we need to track a few missing pieces. Primarily, we need to add the database as a parameter to the outputs as that'll help us better understand if specific databases are being hammered more than others.
Update the /metrics endpoint to serve the following metrics:
New Metrics:
- [ ] http_request_active: Total number of currently active requests, per database.
Update Metrics:
- [ ] http_request_duration_seconds_[sum,bucket,count]: Add database as parameter
- [ ] http_response_body_size_bytes_[sum,bucket,count]: Add database as parameter
- [ ] http_requests_total: Add database as parameter
New Endpoint Parameters:
Right now we're only tracking the v2 and v3 endpoints for writes and queries. We should include v1 endpoints to ensure we're capturing all info. We don't have to worry about ping or debug endpoints for v1.
For example, we current have:
http_request_duration_seconds_sum{method="POST",method_path="POST /api/v2/write",path="/api/v2/write",status="ok"}
Let's add one for the v1 write endpoint as well.
http_request_duration_seconds_count{method="POST",method_path="POST /write",path="//write",status="ok"}
Updated Endpoints
- [ ] All versions of http_request_duration_seconds_[sum,bucket,count]: Add /write, /query endpoints
- [ ] All versions of http_response_body_size_bytes_[sum,bucket,count]: Add /write, /query endpoints
- [ ] All versions of http_requests_total: Add /write, /query endpoints
With respect to the /query and /write endpoints. Those will be tracked with existing metrics when a request is made to them.
If I hit /query with
curl "localhost:8181/query"
I see all the http_* metrics for that endpoint, e.g.,
http_requests_total{method="GET",method_path="GET /query",path="/query",status="aborted"} 0
http_requests_total{method="GET",method_path="GET /query",path="/query",status="client_error"} 1
http_requests_total{method="GET",method_path="GET /query",path="/query",status="ok"} 0
http_requests_total{method="GET",method_path="GET /query",path="/query",status="server_error"} 0
http_requests_total{method="GET",method_path="GET /query",path="/query",status="unexpected_response"} 0
Making the http_* metrics track per database is possible, but should be done in IOx.
The Prometheus docs do caution against using too many labels:
CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.
In Core, this is less an issue, because there is a 5 database limit. But should we enable many databases in Enterprise, this could become a problem.