metricsQL: add function for merging time series values based on label value
Is your feature request related to a problem? Please describe
The problem of udpates in TSDBs is still relevant and has no good solution yet. Sometimes, users need to correct some of the previously written values. There is no a good and simple way to do this so far. The only legit way is to delete the series and ingest it once again with already corrected values.
Describe the solution you'd like
One of workarounds for updating/correcting previously written values is to use revision label. revision label can have a numeric value to reflect the iteration of updates. For example, foo{"revision"="1"} 1 is the original time series, and foo{"revision"="2"} 1.5 is the corrected one. Then user can select foo with the latest revision to display the most actual data.
However, such approach also assumes the whole time series will be ingested with new revision label. What if MetricsQL support merging time series values based on a specified numeric label?
For example, merge_latest(foo, "revision") would automatically select both time series foo{"revision"="1"} and foo{"revision"="2"} and aggregate them leaving only samples with the highest revision label value:
foo{"revision"="1"}
1 1 1 1 1 1 1 1
foo{"revision"="2"}
2 2
merge_latest(foo, "revision")
1 1 1 1 2 2 1 1
Describe alternatives you've considered
No response
Additional information
No response
I'd like to take this issue, please assign to me. @hagen1778
I believe this need to be discussed with @valyala first. I'll try to bring some attention to this.
@Damon07 for what purpose you need this FR?
we have the need to update the data uploaded by sensors in our app, like weight from wrong measurement.
we have the need to update the data uploaded by sensors in our app, like weight from wrong measurement.
How frequently do you plan to update this data?
Actually we can use this too. We have multiple recording rules that sometimes break because of missing data. The data is not actually missing and we deliver it later somehow but with a delay. This case is more important for us in our SLI/SLO records, because we don't want them to be wrong and misleading. Whole time series deletion was not an option for us becauase we didn't have the raw data. We decided to add a "date" label to our recorded time series. This helped us control the downsides of the deletion / replay approach, at the cost of high churn rates.
Aha, I see. I'm asking because I'm not rly sure proposed solution by me is a good one. I'm afraid it could cause more harm by making querying complicated, instead of fixing the root cause.
This case is more important for us in our SLI/SLO records, because we don't want them to be wrong and misleading.
And this is a root cause of your need to fix the data. I believe the following FR should reduce probability of recording rule mistakes - https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4043
@Haleygo do you want to take over the mentioned ticket? I believe, it would be a good contribution in vmalert's reliability.
Actually we can use this too. We have multiple recording rules that sometimes break because of missing data. The data is not actually missing and we deliver it later somehow but with a delay.
@Sin4wd I think rules-backfilling could be a good option here if your data delay duration is big[let's say more than 30min maybe] and regular[like they must be delivered at the end of the day], you can set up some periodic job to do backfilling. But there will be duplicate problem since you already have some normal data, so the deduplication is also needed.
@Haleygo do you want to take over the mentioned ticket? I believe, it would be a good contribution in vmalert's reliability.
@hagen1778 Sure, it's a good enhancement, but not for @Sin4wd 's case here I suppose. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4043 will only maintain a queue for limited evaluation or for limited period since we don't want it to be huge, then cause big pressure when vmselect comes back.
Notes for implementation:
VictoriaMetrics stores index records and actual datapoints at the different folders. Datapoints are linked to the index records via metricID.
At first we have to search all exist index records for the foo metric name and lock corresponding parts for the future metrics deletion.
Results must be grouped by MetricName without revision label. E.g. foo{"revision"="1",instance="1"} and foo{"revision"="2", instance="2"} are different metrics, but it matches query request.
Probably, tombstones may help in this case, matched metricID must be tomestoned and new metricID record must be created. All merged datapoints must belong to the corresponding metricID. It will make "old" datapoints unsearchable. Good article https://disc-projects.bu.edu/lethe/
Next datapoint parts with given metricIDs must be selected at the data folder for the merge. It's not trivial how to apply those changes in stream mode. Probably, best option is reading it into memory per month, merging changes and writing results back to the storage with the new metricID.
All those changes must be atomic. In case of storage shutdown, results must be dropped.
And it seems, that such query must schedule a background process. Since it may take a lot of time to complete.
@f41gh7 I was under impression that it could work similarly to sum function. But instead of summing values with matched labels we'd replace values. And since all this happens only after the indexdb was involved, the metricID concept might be not even needed.
@f41gh7 I was under impression that it could work similarly to
sumfunction. But instead of summing values with matched labels we'd replace values. And since all this happens only after the indexdb was involved, the metricID concept might be not even needed.
Ah, mb. I remember we discussed optimization for series update.
In case, when it'll be just a metricsql function, some logic could be inherited from https://github.com/VictoriaMetrics/VictoriaMetrics/pull/2885
Aha, I see. I'm asking because I'm not rly sure proposed solution by me is a good one. I'm afraid it could cause more harm by making querying complicated, instead of fixing the root cause.
@hagen1778 Suppose we have two k8s cluster each sending data to a central location. A simple query like count(up==0) will return wrong results in a case that one of the clusters fails to send data for some time like 10 minutes.
@Sin4wd I think rules-backfilling could be a good option here if your data delay duration is big[let's say more than 30min maybe] and regular[like they must be delivered at the end of the day], you can set up some periodic job to do backfilling. But there will be duplicate problem since you already have some normal data, so the deduplication is also needed.
I have a rule backfilling job that runs in case of data delivery latency. The problem is that I have to delete the wrong samples before backfilling. To my knowledge deduplication logic doesn't work deterministically in this case. It assumes "last" sample in the dedup window is correct. What would be the timestamp of rule backfilling samples after delivery of new data? I'm not sure this works correctly, because there is no time adjustment on the recording rule samples I suppose. Am I missing something? If we have a revision, I can have a more correct revision calculated with 2h lag and merge it with the recent data. I am also following #3759 that might help in my case. However I believe this issue is a better approach to solving my problem and does not have downsides of deleting samples/timeseries.
@hagen1778 Sure, it's a good enhancement, but not for @Sin4wd 's case here I suppose. https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4043 will only maintain a queue for limited evaluation or for limited period since we don't want it to be huge, then cause big pressure when vmselect comes back.
@Haleygo In this case, I have two vmalerts running on my data. One is 15m behind using datasource.lookback = 15m. This helps me backfill those gaps caused by vmalert<-->storage network issue (different from partial data delivery latency case). This also covers for 5m-10m storage downtimes. Of course this is not efficient, because it nearly doubles the search load, and I would be happy if we can have #4043.
To my knowledge deduplication logic doesn't work deterministically in this case. It assumes "last" sample in the dedup window is correct. What would be the timestamp of rule backfilling samples after delivery of new data? I'm not sure this works correctly, because there is no time adjustment on the recording rule samples I suppose.
Yeah, sorry for the misleading.
I was assuming that there is no right or wrong for the data before backfilling[you said "sometimes break because of missing data"] and the data generated by backfilling, in that case, we just need to make sure the final data is deduplicated. But if you want to keep all the data consistent[since you got datasource.lookback as well], I think you are right to delete data before backfilling.
I have a rule backfilling job that runs in case of data delivery latency.
Just curious, how do you determine which rule needs to be backfill, like if the series is 20% missing or you just backfill them anyway.
Sorry if this comment is not specifically related to the current issue. I think this one has the best potentials and want to help the issue by providing other use-cases. Let me know if there is a discussion about these topics elsewhere.
Multiple issues including this one and #3759 can help us overcome the lack of update api, but I'm not sure if this is a good idea for my case. The main doubt is that victoriametrics is designed around the immutable nature of collected data and recording rules are not of that nature I believe. Maybe we need to use Clickhouse itself in case of recording rules? What do you think of having a way to insert some of the data as ephemeral low retention data that does not merge with other parts, yet appears in search results until a more rigid data point shows up. For example no lag vmalert inserts as ephemeral and the lagging vmalert inserts with more confidence level.
Just curious, how do you determine which rule needs to be backfill, like if the series is 20% missing or you just backfill them anyway.
Currently we backfill them all, but that's a great question. This problem holds for any kind of aggregation that misses some underlying data (e.g. one shard of vmagent goes down).
In case of delivery lag, I was thinking of a complementary recording rule that counts confounding timeseries of the main record, and later on, we can compare it with the live count and detect any problems. Maybe this could be the default behavior of vmalert that captures number of selected timeseries for each record. Moreover, if there was a realtime way to detect delivery lag we could have used "rule" unless "lag_flag" to prevent wrong aggregations. (1-present_over_time(up) maybe?)
In case we have missing parts in data (vmagent shard problem), I have no good solution in mind for now. Yet, the problem is here. We sometimes see drops is our Total RPS that is not real and is because of some random fault in our scrape side or sometimes because of dropSamplesOnOverload. Let me know if you have any ideas about this.
Moreover, if there was a realtime way to detect delivery lag we could have used "rule" unless "lag_flag" to prevent wrong aggregations. (1-present_over_time(up) maybe?)
Actually you can have a recording rule for data freshness based on lag function. For example:
rules:
- record: freshness:cluster
expr: max(lag(up)) by (cluster)
The recording rule above will create a time series per cluster (or per vmagent instance or whatever else you use as a metric source). It could be then used as additional condition in alerting rules. Or, it could trigger the alerting rule on its own and invalidate all alerting rules which match by the same label. In two words the idea is the following:
- Raise an alert if data freshness SLO is breached
- Suppress all alerts related to the dimension of breached SLO
Multiple issues including this one and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3759 can help us overcome the lack of update API,
I'm not sure if that is something we'd like to do. We have a rather negative experience of providing UpdateAPI within VictoriaMetrics. Adding workarounds like mentioned in the description could be a false path.
I think, the problem should be attacked from a different angle. If A simple query like count(up==0) will return wrong results in a case that one of the clusters fails to send data for some time like 10 minutes, then this is how it is. It was indeed one cluster was unavailable/invisible for the system for 10m. Even if data was backfilled afterwards you still have evidence of unavailability at that time which can be used for further investigations or invalidation of incorrect responses.
The recording rule which was meant to record count(up) can be adjusted to record max_over_time(count(up))[30m:1m], where 30m is a threshold of availability.
After some discussion, we decided to try implementing this feature and @f41gh7 volunteered to contribute.
This feature would be incredibly useful in my workplace.
This or switching the default behavior of dedup on the same timestamp to choose the most recently inserted value instead of the max value would make this even easier.
Alternatively a config param to change the behavior to use most recently uploaded point instead of max value would be acceptable.
And in the last case, using a merge based on revision would end up performing the same logic, as i would just include the timestamp as the revision. the only downside is, what would the label of the final revision have???
I think having a change of dedup behavior either by default or by config would be an easier fix and less hassle on behalf of the devs using victoriametrics, like myself.
Bump