Expose metrics about disk usage
It would be nice to have builtin metrics about the disk utilization by promscale. I had an issue where my database was using more disk than expected with one metric being uncompressed, and it was not trivial to find the source. I ended up using the hypertable functions, but they take a few seconds to complete and do not give history data.
Metrics ideas:
- disk size by metric + trace/link/event
- compression info, like chunk size before/after
- exact or approximate row count by metric + trace/link/event
@Yamakaky we looked into getting disk consumption for all metrics and disk consumption for all traces but it was very expensive to compute.
Some comments
- It looks like in this particular case, what would have helped you is disk usage by metric name to understand which metric was causing the issue.
- I am not fully understanding the compression info you're after and what problem would it solve? It this to identify that compression isn't working properly?
- What would you use row counts for?
Why is that the metric was not being compressed?
- Yes
- You can compute a compression ratio using the various timescale views for chunk and hypertable info. It would be nice to have builtin metrics, or at least the last value. Maybe not per metrics, at least for all traces and for all metrics, so that it's easier to forecast disk usage.
- For this problem seeing which metrics are the most frequent, but with
metrics rate * compression ratio * bytes per metricyou can forecast disk usage for the compressed part. - I don't know and didn't think to keep the metric data. One particular metric had a few chunks compressed and all the recent chunks uncompressed, way more than the compression policy should allow. I couldn't find a difference between this metric and others that would explain it, except that for an unrelated reason it had way higher ingestion rate than the others.
@Yamakaky have you seen the select * from prom_info.metric; view? it has a bunch of the info you want from 1 and 2.
Oh, I didn't see that, nice. However, the query is very slow (60s on my side with a small dataset), is it expected?
Yes, that view is expected to be slow, as it does a good amount of work under the hood.
Maybe then automatically create metrics from that view? That way you also get historical data.