elastalert icon indicating copy to clipboard operation
elastalert copied to clipboard

Flatline with no matches

Open povils opened this issue 8 years ago • 13 comments

Hi, I'm clearly missing something simple but:

my_rule.yaml

type: flatline
index: index-*
threshold: 1
timeframe:
  hours: 24
use_count_query: true
doc_type: doc
filter:
- query:
    query_string:
      query: "application:\"nonsense\""

I would expect that: elastalert-test-rule my_rule.yaml would say something like "An abnormally low number of events .." because obviously there are no events with field and value "nonsense" and there will never be. However, if I change timeframe to hours:1, strangely it hits the rule and says "An abnormally low ...". To make sure I created that original rule and left for days, but still no alerts...

povils avatar Feb 12 '18 10:02 povils

Does it trigger an alert immediately if you add --start 2018-05-28 where that date is 24+ hours ago?

I don't think there is anything special about 24 hours exactly, but I guess it's possible. Are you sure you waited a full 24 hours and there was no matching documents?

Qmando avatar May 29 '18 18:05 Qmando

But I left that alert for days and nothing or missing something ?

povils avatar May 29 '18 21:05 povils

You are on version 0.1.31? I'll try to reproduce this and get back to yall.

Qmando avatar May 29 '18 23:05 Qmando

I can't reproduce this using your rule config:

$ python -m elastalert.elastalert --rule test.yaml --debug --start 2018-05-29
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
                To send them but remain verbose, use --verbose instead.
INFO:elastalert:Starting up
INFO:elastalert:Queried rule dffgdfgd from 2018-05-28 17:00 PDT to 2018-05-28 18:00 PDT: 0 hits
.....
INFO:elastalert:Queried rule dffgdfgd from 2018-05-30 11:00 PDT to 2018-05-30 11:08 PDT: 0 hits
INFO:elastalert:Skipping writing to ES: {'rule_name': u'dffgdfgd.all', '@timestamp': '2018-05-30T18:08:43.241762Z', 'exponent': 0, 'until': '2018-06-02T18:08:43.241754Z'}
INFO:elastalert:Alert for dffgdfgd at 2018-05-30T02:00:00Z:
INFO:elastalert:dffgdfgd

An abnormally low number of events occurred around 2018-05-29 19:00 PDT.
Between 2018-05-28 18:55 PDT and 2018-05-29 19:00 PDT, there were less than 1 events.

@timestamp: 2018-05-30T02:00:00Z
count: 0
key: all
num_hits: 0
num_matches: 36

INFO:elastalert:Ignoring match for silenced rule dffgdfgd.all
...

Same exact thing if I try 23 hours. Is this not what you are doing? Can you show logs from when your 23 hour timeframe works right away and 24 hours doesnt?

Qmando avatar May 30 '18 18:05 Qmando

Ty for the info. I'll take a look at this again.

Qmando avatar Jun 01 '18 17:06 Qmando

Has anyone found a resolution to this? My rule is set up like so:

name: {{ region_env_name }} Number of calls (5 min)
type: flatline
index: callflows-*
threshold: 1
timeframe:
    minutes: 5
use_count_query: true
doc_type: doc

filter:
    - query:
        query_string:
            query: "_exists_:CVPAppName AND CVPAppName:APP AND (CallType:7526 OR 7525)"

alert:
    - "sns"
sns_topic_arn: "SNS_ARN"

From reading the documents I would think Flatline would send an alert because there are no hits matching the query above, however the elastalert if not firing.

jberto78 avatar May 01 '19 17:05 jberto78

Can you post logs? Run elastalert with --verbose for at least 5 minutes.

Qmando avatar May 01 '19 18:05 Qmando

Hi Qmando, it actually ended up working, I just didn't give it enough time for elastalert to process the alert I guess. Thanks for following up.

jberto78 avatar May 01 '19 22:05 jberto78

Same issue here. pip show elastalert Name: elastalert Version: 0.0.75

Version upgraded to 0.1.29 makes the same.

scan_entire_timeframe: true
# --- Begin Type Specific Rule Configuration ---
type: flatline
timeframe:
  days: 7
run_every:
  minutes: 10
threshold: 1
use_count_query: true

No matches were found even having not reached the threshold. The index pattern we doing is index: index-* Have checked few things already like the buffer_time

Letting elastalert daemon configured with defaults doesn't alert either. Other flatline alerts are doing fine

mariobede avatar Jun 16 '19 08:06 mariobede

Update:

python -m elastalert.elastalert --rule test.yaml --debug --start 2019-06-08 works fine but doesn't when it is running as a daemon.

How is the start affecting to the search? Is it same as timeframe set in the config file?

timeframe: days: 7

mariobede avatar Jun 16 '19 12:06 mariobede

Wow what a coincidence. We missed an outage this week because our dirt simple flatline rule refused to fire as per the documentation says it should. Now the client is asking why we missed it.

While I appreciate the author work and willingness open source elastalert this is very frustrating.

I have wasted hours re-testing this with elastalert-test-rule and it only fires if --start is set for over 36 hours prior but it won't fire when running in production.

If my look back is set to 24 hours and my threshold is 3 and there have been no indexes logged for 12 hours prior then should have fired and should refire if I erase all the elastalert metadata and restart, but it doesn't. At least that is how the documentation describes it. It should be that simple.

Maybe this rule should marked as being an incomplete or broken so people don't trust being alerted to production system outage by using it until it get fixed. Because it is definitely broken.

Also the results of running elastalert-test-rule and the actual running in production should be exactly the same.

I know the documentation mentions that the results could be different, but if that the case then running elastalert-test-rule is worthless because it not a valid test unless it produces the exact same result just like a unit test

JungleGenius avatar Jun 16 '19 19:06 JungleGenius

name: "Testing 123 Potential System Outage"

http_post_static_payload: alert_name: "Testing 123 Potential System Outage" alert_form: "PotentialSystemOutageAlarm"

http_post_all_values: True

verify_certs: False

index: agent-publicipaddress-*

type: flatline

query_key: _index

doc_type: publicipaddress

threshold: 1

timeframe: hours: 12

alert:

  • "debug"
  • "post"

JungleGenius avatar Jun 16 '19 19:06 JungleGenius

I'm not the only one: https://github.com/Yelp/elastalert/issues/2157 https://github.com/Yelp/elastalert/issues/2060 https://github.com/Yelp/elastalert/issues/2049

JungleGenius avatar Jun 16 '19 19:06 JungleGenius