VictoriaMetrics icon indicating copy to clipboard operation
VictoriaMetrics copied to clipboard

Find more documents ALERTS and ALERTS_FOR_STATE

Open LeonYanghaha opened this issue 2 years ago • 1 comments

Hello, regarding the time series ALERTS and ALERTS_FOR_STATE written by the command-line flag -remoteWrite.url of vmalert, I have seen these documents, but they still can't answer my question. I'm puzzled. Do you have more descriptive information about the structures of ALERTS and ALERTS_FOR_STATE.

Who can answer my questions or provide relevant documents? Thank you very much for your help The questions I don't understand are:

  • What are the similarities and differences between ALERTS and ALERTS_FOR_STATE?
  • In ALERTS_FOR_STATE, the value of the attribute value is an array. What do the two elements represent respectively? Is this the start and end time?
  • Why are all group 1 image

LeonYanghaha avatar Sep 02 '22 08:09 LeonYanghaha

Hello! Both metrics are produced by vmalert to remain complaint with Prometheus ecosystem. It is assumed that people already know about Prometheus and how it does the alerting thing. This assumption is not correct, so we need to udpate the docs. So far the answers to your questions are following:

What are the similarities and differences between ALERTS and ALERTS_FOR_STATE?

ALERTS is a time series which shows current active alerts. Active alert is an alert in PENDING or FIRING state. The state is reflected via label alertstate. This time series is produced to get retrospective overview over the alert which were active recently. For example, to check which alerts triggered or almost triggered over the weekend.

ALERTS_FOR_STATE is a service time series and used by vmalert to restore the state of the active alerts. This time series is produced only for alerts with for>0. When alert becomes active, vmalert pushes ALERTS_FOR_STATE time series to remoteWrite.url. The value of the time series is the moment alert became active. When vmalert is restarted, it fetches from remote storage all ALERTS_FOR_STATE series in order to restore the in-memory state of the alerts. It is especially important for alerting rules where for=1h or more, so restart of the vmalert process will not reset the for counter.

In ALERTS_FOR_STATE, the value of the attribute value is an array. What do the two elements represent respectively? Is this the start and end time?

The first value is the timestamp when value was recorded, the second value - is the timestamp when alert became active.

Why are all group 1

group is not related to VM responses. Could you please show how exactly you got this response (the command and full response)?

hagen1778 avatar Sep 12 '22 19:09 hagen1778

@hagen1778 Thanks for the explanation :) There is one question for ALERTS_FOR_STATE.

ALERTS_FOR_STATE is a service time series and used by vmalert to restore the state of the active alerts. This time series is produced only for alerts with for>0.

Why does ALERTS_FOR_STATE only store the timeseries for alerts with for >0, can for = 0 be considered a special case? This behaves different with Prometheus. Prometheus will store ALERTS_FOR_STATE for an alert, even if for = 0.

just1900 avatar May 15 '23 07:05 just1900

Prometheus will store ALERTS_FOR_STATE for an alert, even if for = 0.

Are you sure about that? ALERTS_FOR_STATE is literally a metric about for state. If for == 0 - there is no state. Even if Prometheus does so - what is the point of doing so? ALERTS_FOR_STATE is read on vmalert startup (or Prometheus startup) to restore the PENDING state of the alert. For example, you had an alert with for: 1h. Alert became active, and remained so for 40min. It needed only 20min more to become FIRING. And at this moment, vmalert (or Prometheus) was restarted for some reason. So instead of starting countdown for this alerting rule from 0, we fetch the ALERTS_FOR_STATE metric and restore its state back to 40min PENDING.

For alerting rules with for==0 we don't need to do this. Because such rule will FIRE immediately after the evaluation, it does not need to wait.

hagen1778 avatar May 15 '23 14:05 hagen1778

Are you sure about that?

Yep, I am switching from Prometheus to Vmalert and noticed this difference.

what is the point of doing so?

I understand its original use case is for restoring. I am currently using the ALERTS_FOR_STATE timeseries to retrieve alert history. Its value provides me with an accurate alert start time.

just1900 avatar May 16 '23 01:05 just1900

@just1900 can you do it with ALERTS metric?

hagen1778 avatar May 17 '23 08:05 hagen1778

The ALERTS lacks information about the start time. That's also why I suggest treating for=0 as a special case for the ALERTS_FOR_STATE metric. Does this make sense to you?

just1900 avatar May 19 '23 02:05 just1900

Does this make sense to you?

For this only reason - no. If there is something more behind this what Prometheus is doing for alerts with for:0 - then yes, it would improve compatibility.

Please note, this is my opinion. I'd like to hear thoughts from other community members though.

For getting a timestamp, try using timestamp-family of functions.

hagen1778 avatar May 19 '23 07:05 hagen1778

Hello @just1900! Once https://github.com/VictoriaMetrics/VictoriaMetrics/pull/5680 is merged, vmalert will start exposing ALERTS_FOR_STATE time series for alerts with for: 0 param.

hagen1778 avatar Jan 24 '24 14:01 hagen1778