Query result should contains the timeseries information
When we query from a metrics table written by prometheus remote write:
select * from prometheus_tsdb_head_min_time order by greptime_timestamp;
+---------------------+---------------------+----------------+------------+
| greptime_timestamp | greptime_value | instance | job |
+---------------------+---------------------+----------------+------------+
| 2022-10-21 08:51:01 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:16 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:31 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:46 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:01 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:16 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:31 | 1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:46 | 1666340534420 | localhost:9090 | prometheus |
....
We don't known the timeseries information at all( for example instance, job here), so when we want to supports remote read ,we have to find out the timeseries by ourself such as:
let mut timeseries = HashMap<String, TimeSeries>();
for row in rows {
let job = ..;
let instance = ...;
if let Some(ts) = timeseries.get(format!("{}{}", job, instance)) {
// append samples
} else {
// create a timeseries and insert it into timeseries map.
}
}
The timeseries protobuf message:
// TimeSeries represents samples and labels for a single time series.
message TimeSeries {
// For a timeseries to be valid, and for the samples and exemplars
// to be ingested by the remote system properly, the labels field is required.
repeated Label labels = 1;
repeated Sample samples = 2;
repeated Exemplar exemplars = 3;
}
It's really ugly and not good for performance. I think we should improve it in future.
@killme2008 this seems still relevant. What is our starting point? Read the PromQL remote read code path and investigate how to add the time series info and whether it helps?
cc @waynexia
We have a plan to support it, and we already have tsid in the storage format, it can be used to improve the performance.
For prom remote read API, maybe we can also select the tsid column. However, we already have native PromQL support so the priority of this issue should be low.
it can be used to improve the performance.
We can use it to improve performance and that is what I'm going to do, but it is out of this issue's topic.