cacti icon indicating copy to clipboard operation
cacti copied to clipboard

Spikekill not working as expected

Open aryaaneesh5 opened this issue 2 years ago • 6 comments

Hi Team,

I have installed cacti v1.2.18 with multi poller setup: 1 main poller and 3 remote pollers .We noticed issues with spike kill. We have enabled around 20 graph templates in Spike kill feature but could still see there are spikes in the graph which is not getting removed automatically as it should be. The removal schedule is set to 6hours which doesn't work either. The details are in below. Could you please assist on the same to resolve this issue.

Cacti 1.2.18 Spine 1.2.19 MariaDB 10.3. OS: Linux Browser: Chrome

Spikekill settings:

Removal method : Standard deviation Replacement method : Average Number of standard deviations: 9 Standard deviation Variance percentage : 900% Variance number of outliers : 8 high/low samples Maximum kills per RRA : 3 spikes

Please let us know, any change in the settings above can make feature for effective in killing spikes automatically.

aryaaneesh5 avatar Mar 10 '23 05:03 aryaaneesh5

Hi Team,

We have around 400k graph templates enabled for 3600 devices.

Could you please let us know, if any changes to be made in spike kill settings that we already have?

aryaaneesh5 avatar Mar 16 '23 05:03 aryaaneesh5

Well, this is a complicated subject. The proactive spike killing does not scale. I've done some things personally to so "repairs", but it's very customized for the use case. For example, I recently had an issue where the poller output got corrupted, and I have to repair a time range with previous averages and massively parallel to do 1M RRDfiles. But the tool does not forgive, and takes some training as to how to use it.

TheWitness avatar Mar 18 '23 15:03 TheWitness

Could please share any workarounds for removing single spike?

aryaaneesh5 avatar Mar 20 '23 03:03 aryaaneesh5

Dump the RRDfile to XML, hand edit out the spike, and re-import the resulting XML.

TheWitness avatar Mar 26 '23 15:03 TheWitness

Hi Team,

As per my understanding, the spikes are removed/killed inside rrd files. Is there any specific reason for why it is not removed in visualization.

aryaaneesh5 avatar Apr 03 '23 01:04 aryaaneesh5

Long standing question i stumbled on while checking the issues for anything related to spikekill ...

I realized as well, that in certain situations a spike might be still in the visualization after you ran the spikekill feature.

First of all, you need to be careful with the spikekill tool, because in the current version that we use (1.2.28) there seems to be another bug that makes spikekill sometimes kill the whole RRD content, leaving you with no data across long ranges. It seems like it is currently working only on the 1st data row in an RRD file or if the file contains only one row. Other data might get st to "NaN".

Regarding the effectiveness of the spike removal, i noticed that the statistical approaches (i.e. Std Derivation and Variance) were all not working for me at all. The only things that i found doing anything to the data was the "Gap FilL" and the "Float" options. My approach was usually to zoom into the spike, so that the edges of it were basically placed a few pixels away from the left and right diagram borders. Then I hit the "GapFill" or "Float" option and the spike was flattened out. Usually the "Float" approach did the trick better, while Gap Fill could result in all values being set to zero in the RRD file.

But also there i observed, that in certain situations the spike re-appears depending on the zoom level. Both however worked only up to 1.2.27 as it seems, but with 1.2.28 i was just losing data, so i keep my fingers off it for now until that part is reviewed and fixed.

Regarding the re-appearance of spikes in longer term diagrams:

This can happen and is apparently part of the nature of the RRD archives. The data rows contain parts of data in different densities, with data being condensed more and more into less detailed sections with larger time-spans per s tep, becoming less detailed. You can actually see that if you zoom into an older part of the diagram, the data becomes coarse, while on the very right end you can zoom in much deeper without too much distortion. This is the nature of the RRD tool that is acting in the background. If you now zoom in to exactly enclose a spike in the diagram span and then float it, then it seems that you will capture only the time span that is shown in the graph and not the more coarse parts where the data is also reflected in while being condensed. So the spike is in the fine data you just processed, plus there seems to be a leftover copy in parts with the next levels of resolution.

There's a bit of interesting stuff about spike removal on Tobias Oetiker's page (the maintainer of rrdtool), though that is also quite old. See: https://oss.oetiker.ch/rrdtool/pub/contrib/

@TheWitness - was the Cacti spike-killer originally based on Tobi's code or was it developed completely independently from that "spikekill-1.1.1" tool from 2009 i found on Tobis page?

bernisys avatar Feb 17 '25 12:02 bernisys