PellMon
PellMon copied to clipboard
Sudden enormous pellet consumption (restart can cause the feeder_time COUNTER to wrap)
The correct way is to use the rrd type DERIVE with a min value of 0 for feeder_time, it is apparently tricky to reliably insert an 'undefined' when the counter starts over to paper over the problem.
One way to handle the transition is to change the default to DERIVE, and leave old databases as is. Pellmonsrv could use an rpn script instead of TOTAL to get away from the problem when counters are used, possibly with some performance impact.
Existing databases could also be changed with rrdtune, but that would not help if there already is a false counter wrap in there.
Changed the default for feeder_time to DERIVE with miniumum value of 0 in 33b26be616746ef4c9f51e9406ae82037e5e02c3, that should eliminate the problem for new installations:
d09 = DS:%s:DERIVE:%u:0:U
When you are hit by this bug you will see a sudden enormous pellet consumption casing a giant bar in the consumption bar charts and the silo level will instantly drop to big negative value. This can be fixed by using the command rrdtool tune
to set a maximum value on the feeder_time counter, and then using the rrdtool dump
command to write out the database to an xml file and after that reimport it with range checking enabled to erase the giant counter value.
First make a backup copy of the database just in case it's needed:
cp /usr/local/var/lib/pellmon/rrd.db backup-rrd.db
Stop pellmon:
sudo service pellmonsrv stop
Change the maximum allowed value of the counter 'feeder_time' to 100:
sudo rrdtool tune /usr/local/var/lib/pellmon/rrd.db -a feeder_time:100
Dump the database to an xml file:
rrdtool dump /usr/local/var/lib/pellmon/rrd.db rrd.xml
Then restore the database from the xml fil with range checking:
sudo rrdtool restore -r -f rrd.xml /usr/local/var/lib/pellmon/rrd.db
Start pellmon:
sudo service pellmonsrv start
I just encountered this issue on version 0.7.0 I followed your instructions and it solved the error.
I did get an error when trying to correct the feeder_time with the command. A quick check through a .xml dump showed the name "feeder_time" has changed to "feedertime" on more recent versions.
This is output of .xml dump before editing max value. Min value is defined, but the bug showed up :-)
<name> feedertime </name>
<type> DERIVE </type>
<minimal_heartbeat>120</minimal_heartbeat>
<min>0.000000000e+00</min>
<max>NaN</max>
<!-- PDP Status -->
<last_ds>1978080</last_ds>
<value>2.166666667e+00</value>
<unknown_sec> 0 </unknown_sec>
</ds>
I can't really understand how that can happen unfortunately, but nice to know that the workaround still works. According to the 'DERIVE' chapter here https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html it shouldn't be possible for that to happen anymore.
Yes I have changed the default ds_names at some point, but this workaround wasn't supposed to be needed when the database was created with the new pellmon version. Maybe it would be a good idea to set a maximum value also for feedertime by default, which should make the impossible really really impossible... :-)