PellMon icon indicating copy to clipboard operation
PellMon copied to clipboard

Sudden enormous pellet consumption (restart can cause the feeder_time COUNTER to wrap)

Open motoz opened this issue 10 years ago • 4 comments

The correct way is to use the rrd type DERIVE with a min value of 0 for feeder_time, it is apparently tricky to reliably insert an 'undefined' when the counter starts over to paper over the problem.

One way to handle the transition is to change the default to DERIVE, and leave old databases as is. Pellmonsrv could use an rpn script instead of TOTAL to get away from the problem when counters are used, possibly with some performance impact.

Existing databases could also be changed with rrdtune, but that would not help if there already is a false counter wrap in there.

motoz avatar Feb 18 '15 13:02 motoz

Changed the default for feeder_time to DERIVE with miniumum value of 0 in 33b26be616746ef4c9f51e9406ae82037e5e02c3, that should eliminate the problem for new installations:

d09 = DS:%s:DERIVE:%u:0:U

motoz avatar Feb 26 '15 16:02 motoz

When you are hit by this bug you will see a sudden enormous pellet consumption casing a giant bar in the consumption bar charts and the silo level will instantly drop to big negative value. This can be fixed by using the command rrdtool tune to set a maximum value on the feeder_time counter, and then using the rrdtool dump command to write out the database to an xml file and after that reimport it with range checking enabled to erase the giant counter value.

First make a backup copy of the database just in case it's needed:

cp /usr/local/var/lib/pellmon/rrd.db backup-rrd.db

Stop pellmon:

sudo service pellmonsrv stop

Change the maximum allowed value of the counter 'feeder_time' to 100:

sudo rrdtool tune /usr/local/var/lib/pellmon/rrd.db -a feeder_time:100

Dump the database to an xml file:

rrdtool dump /usr/local/var/lib/pellmon/rrd.db rrd.xml

Then restore the database from the xml fil with range checking:

sudo rrdtool restore -r -f rrd.xml /usr/local/var/lib/pellmon/rrd.db

Start pellmon:

sudo service pellmonsrv start

motoz avatar Nov 23 '15 06:11 motoz

I just encountered this issue on version 0.7.0 I followed your instructions and it solved the error.

I did get an error when trying to correct the feeder_time with the command. A quick check through a .xml dump showed the name "feeder_time" has changed to "feedertime" on more recent versions.

This is output of .xml dump before editing max value. Min value is defined, but the bug showed up :-)

	<name> feedertime </name>
	<type> DERIVE </type>
	<minimal_heartbeat>120</minimal_heartbeat>
	<min>0.000000000e+00</min>
	<max>NaN</max>
	<!-- PDP Status -->
	<last_ds>1978080</last_ds>
	<value>2.166666667e+00</value>
	<unknown_sec> 0 </unknown_sec>
</ds>

lassepc avatar Jan 19 '18 15:01 lassepc

I can't really understand how that can happen unfortunately, but nice to know that the workaround still works. According to the 'DERIVE' chapter here https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html it shouldn't be possible for that to happen anymore.

Yes I have changed the default ds_names at some point, but this workaround wasn't supposed to be needed when the database was created with the new pellmon version. Maybe it would be a good idea to set a maximum value also for feedertime by default, which should make the impossible really really impossible... :-)

motoz avatar Jan 19 '18 16:01 motoz