Flatline with no matches
Hi, I'm clearly missing something simple but:
my_rule.yaml
type: flatline
index: index-*
threshold: 1
timeframe:
hours: 24
use_count_query: true
doc_type: doc
filter:
- query:
query_string:
query: "application:\"nonsense\""
I would expect that: elastalert-test-rule my_rule.yaml would say something like "An abnormally low number of events .." because obviously there are no events with field and value "nonsense" and there will never be. However, if I change timeframe to hours:1, strangely it hits the rule and says "An abnormally low ...". To make sure I created that original rule and left for days, but still no alerts...
Does it trigger an alert immediately if you add --start 2018-05-28 where that date is 24+ hours ago?
I don't think there is anything special about 24 hours exactly, but I guess it's possible. Are you sure you waited a full 24 hours and there was no matching documents?
But I left that alert for days and nothing or missing something ?
You are on version 0.1.31? I'll try to reproduce this and get back to yall.
I can't reproduce this using your rule config:
$ python -m elastalert.elastalert --rule test.yaml --debug --start 2018-05-29
INFO:elastalert:Note: In debug mode, alerts will be logged to console but NOT actually sent.
To send them but remain verbose, use --verbose instead.
INFO:elastalert:Starting up
INFO:elastalert:Queried rule dffgdfgd from 2018-05-28 17:00 PDT to 2018-05-28 18:00 PDT: 0 hits
.....
INFO:elastalert:Queried rule dffgdfgd from 2018-05-30 11:00 PDT to 2018-05-30 11:08 PDT: 0 hits
INFO:elastalert:Skipping writing to ES: {'rule_name': u'dffgdfgd.all', '@timestamp': '2018-05-30T18:08:43.241762Z', 'exponent': 0, 'until': '2018-06-02T18:08:43.241754Z'}
INFO:elastalert:Alert for dffgdfgd at 2018-05-30T02:00:00Z:
INFO:elastalert:dffgdfgd
An abnormally low number of events occurred around 2018-05-29 19:00 PDT.
Between 2018-05-28 18:55 PDT and 2018-05-29 19:00 PDT, there were less than 1 events.
@timestamp: 2018-05-30T02:00:00Z
count: 0
key: all
num_hits: 0
num_matches: 36
INFO:elastalert:Ignoring match for silenced rule dffgdfgd.all
...
Same exact thing if I try 23 hours. Is this not what you are doing? Can you show logs from when your 23 hour timeframe works right away and 24 hours doesnt?
Ty for the info. I'll take a look at this again.
Has anyone found a resolution to this? My rule is set up like so:
name: {{ region_env_name }} Number of calls (5 min)
type: flatline
index: callflows-*
threshold: 1
timeframe:
minutes: 5
use_count_query: true
doc_type: doc
filter:
- query:
query_string:
query: "_exists_:CVPAppName AND CVPAppName:APP AND (CallType:7526 OR 7525)"
alert:
- "sns"
sns_topic_arn: "SNS_ARN"
From reading the documents I would think Flatline would send an alert because there are no hits matching the query above, however the elastalert if not firing.
Can you post logs? Run elastalert with --verbose for at least 5 minutes.
Hi Qmando, it actually ended up working, I just didn't give it enough time for elastalert to process the alert I guess. Thanks for following up.
Same issue here.
pip show elastalert Name: elastalert Version: 0.0.75
Version upgraded to 0.1.29 makes the same.
scan_entire_timeframe: true
# --- Begin Type Specific Rule Configuration ---
type: flatline
timeframe:
days: 7
run_every:
minutes: 10
threshold: 1
use_count_query: true
No matches were found even having not reached the threshold. The index pattern we doing is index: index-* Have checked few things already like the buffer_time
Letting elastalert daemon configured with defaults doesn't alert either. Other flatline alerts are doing fine
Update:
python -m elastalert.elastalert --rule test.yaml --debug --start 2019-06-08 works fine but doesn't when it is running as a daemon.
How is the start affecting to the search? Is it same as timeframe set in the config file?
timeframe: days: 7
Wow what a coincidence. We missed an outage this week because our dirt simple flatline rule refused to fire as per the documentation says it should. Now the client is asking why we missed it.
While I appreciate the author work and willingness open source elastalert this is very frustrating.
I have wasted hours re-testing this with elastalert-test-rule and it only fires if --start is set for over 36 hours prior but it won't fire when running in production.
If my look back is set to 24 hours and my threshold is 3 and there have been no indexes logged for 12 hours prior then should have fired and should refire if I erase all the elastalert metadata and restart, but it doesn't. At least that is how the documentation describes it. It should be that simple.
Maybe this rule should marked as being an incomplete or broken so people don't trust being alerted to production system outage by using it until it get fixed. Because it is definitely broken.
Also the results of running elastalert-test-rule and the actual running in production should be exactly the same.
I know the documentation mentions that the results could be different, but if that the case then running elastalert-test-rule is worthless because it not a valid test unless it produces the exact same result just like a unit test
name: "Testing 123 Potential System Outage"
http_post_static_payload: alert_name: "Testing 123 Potential System Outage" alert_form: "PotentialSystemOutageAlarm"
http_post_all_values: True
verify_certs: False
index: agent-publicipaddress-*
type: flatline
query_key: _index
doc_type: publicipaddress
threshold: 1
timeframe: hours: 12
alert:
- "debug"
- "post"
I'm not the only one: https://github.com/Yelp/elastalert/issues/2157 https://github.com/Yelp/elastalert/issues/2060 https://github.com/Yelp/elastalert/issues/2049