greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

Query result should contains the timeseries information

Open killme2008 opened this issue 3 years ago • 3 comments

When we query from a metrics table written by prometheus remote write:

select * from prometheus_tsdb_head_min_time order by greptime_timestamp;
+---------------------+---------------------+----------------+------------+
| greptime_timestamp  | greptime_value      | instance       | job        |
+---------------------+---------------------+----------------+------------+
| 2022-10-21 08:51:01 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:16 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:31 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:51:46 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:01 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:16 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:31 |       1666340534420 | localhost:9090 | prometheus |
| 2022-10-21 08:52:46 |       1666340534420 | localhost:9090 | prometheus |

....

We don't known the timeseries information at all( for example instance, job here), so when we want to supports remote read ,we have to find out the timeseries by ourself such as:

let  mut timeseries = HashMap<String, TimeSeries>();

for row in rows {
  let job = ..;
  let instance = ...;

  if let Some(ts) = timeseries.get(format!("{}{}", job, instance)) {
     // append samples
  } else {
     // create a timeseries and insert it into timeseries map.
  }

}

The timeseries protobuf message:

// TimeSeries represents samples and labels for a single time series.
message TimeSeries {
  // For a timeseries to be valid, and for the samples and exemplars
  // to be ingested by the remote system properly, the labels field is required.
  repeated Label labels   = 1;
  repeated Sample samples = 2;
  repeated Exemplar exemplars = 3;
}

It's really ugly and not good for performance. I think we should improve it in future.

killme2008 avatar Oct 21 '22 10:10 killme2008

@killme2008 this seems still relevant. What is our starting point? Read the PromQL remote read code path and investigate how to add the time series info and whether it helps?

tisonkun avatar Apr 01 '24 01:04 tisonkun

cc @waynexia

We have a plan to support it, and we already have tsid in the storage format, it can be used to improve the performance.

killme2008 avatar Apr 01 '24 01:04 killme2008

For prom remote read API, maybe we can also select the tsid column. However, we already have native PromQL support so the priority of this issue should be low.

it can be used to improve the performance.

We can use it to improve performance and that is what I'm going to do, but it is out of this issue's topic.

evenyag avatar Apr 01 '24 06:04 evenyag