cacti icon indicating copy to clipboard operation
cacti copied to clipboard

Viewing graphs can break when boost is running in some rare cases

Open bmfmancini opened this issue 3 years ago • 23 comments

Hey All,

I have found that on 1.2.22 in between boost runs some devices show gaps in the graph I traced the data from spine all the way to poller_output_boost

the data makes it way correctly until it hits the RRA write it seems that for some the data is not written The data is removed out of poller_output_boost after each run so its not stuck in there

No relevant errors in log For the affected devices I have rebuilt the poller cache no difference

bmfmancini avatar Oct 03 '22 18:10 bmfmancini

FOund this when running boost manually

php poller_boost.php --verbose --debug --force >> /tmp/boost.txt

2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 128 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 127 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 126 of 130 for Boost Process 1
2022/10/03 14:33:40 - CMDPHP SQL Backtrace: (/poller_boost.php[222]:boost_output_rrd_data(), /poller_boost.php[588]:boost_process_local_data_ids(), /poller_boost.php[688]:db_fetch_assoc(), /lib/database.php[593]:db_fetch_assoc_prepared(), /lib/database.php[613]:db_execute_prepared())
2022/10/03 14:33:40 - CMDPHP ERROR: A DB Row Failed!, Error: Table 'cacti.poller_output_boost_arch_1664822004' doesn't exist
2022/10/03 14:33:40 - BOOST CHILD DEBUG: Processing 125 of 130 for Boost Process 1

bmfmancini avatar Oct 03 '22 18:10 bmfmancini

Output from debug file

DEBUG: Checking if Boost is ready to run.
DEBUG: Last Runtime was 2022-10-03 14:08:27 (1664820507).
DEBUG: Next Runtime is 2022-10-03 15:08:27 (1664824107).
DEBUG: Records Found:6717232, Max Threshold:7000000.
DEBUG: Time to Run Boost, Force Run is true!
DEBUG: Parallel Process Setup Begins.
DEBUG: Data Sources:89253, Concurrent Processes:1
DEBUG: Parallel Process Setup Complete.  Ready to spawn children.
DEBUG: About to launch 1 processes.
DEBUG: Launching Boost Process Number 1
Total[1.4670] DEBUG: About to Spawn a Remote Process [CMD: /bin/php, ARGS: /var/www/html/cacti/poller_boost.php --child=1 --debug]
DEBUG: 1 Processes Running, Sleeping for 2 seconds.

bmfmancini avatar Oct 03 '22 18:10 bmfmancini

Boost tables are clean according to audit_database

bash-4.2$ php audit_database.php --report | grep boost Checking Table: 'poller_output_boost' - Clean Checking Table: 'poller_output_boost_local_data_ids' - Clean Checking Table: 'poller_output_boost_processes' - Clean bash-4.2$

bmfmancini avatar Oct 03 '22 18:10 bmfmancini

OK, so when I ran boost the first time manually I think I collided with cacti running it? rerunning it manually seems fine but the strange thing is the first time I ran it the boost table went empty

bmfmancini avatar Oct 03 '22 20:10 bmfmancini

Test now @bmfmancini

TheWitness avatar Oct 05 '22 22:10 TheWitness

So far so good @TheWitness will let it soak for a bit and let you know

bmfmancini avatar Oct 06 '22 23:10 bmfmancini

@TheWitness unfortunately still seeing gaps in plotting

bmfmancini avatar Oct 07 '22 13:10 bmfmancini

confirmed its only after a graph has been viewed and boost runs afterwards

bmfmancini avatar Oct 07 '22 13:10 bmfmancini

Any errors in the log?

TheWitness avatar Oct 07 '22 21:10 TheWitness

No error at all

On Fri., Oct. 7, 2022, 5:09 p.m. TheWitness, @.***> wrote:

Any errors in the log?

— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4941#issuecomment-1272088421, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTGBFSJ5UXRHI25BUHLWCCGPNANCNFSM6AAAAAAQ32N42A . You are receiving this because you were mentioned.Message ID: @.***>

bmfmancini avatar Oct 07 '22 21:10 bmfmancini

Well, that's good. Now you have to find the real reason. How many poller items for the device in question?

TheWitness avatar Oct 08 '22 15:10 TheWitness

Various amounts of poller items and different templates

Alot of them have no commonality device type wise and it at random times but always seems to be associated with a boost run

On Sat., Oct. 8, 2022, 11:56 a.m. TheWitness, @.***> wrote:

Well, that's good. Now you have to find the real reason. How many poller items for the device in question?

— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4941#issuecomment-1272347294, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTBHWD6N5HE26L3Q2M3WCGKRZANCNFSM6AAAAAAQ32N42A . You are receiving this because you were mentioned.Message ID: @.***>

bmfmancini avatar Oct 08 '22 16:10 bmfmancini

You need to very specific. If there is more than one device, give me a count for each.

TheWitness avatar Oct 08 '22 16:10 TheWitness

Ok will do

On Sat., Oct. 8, 2022, 12:33 p.m. TheWitness, @.***> wrote:

You need to very specific. If there is more than one device, give me a count for each.

— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4941#issuecomment-1272354048, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTH2K3YKZOFHPZE7MO3WCGO55ANCNFSM6AAAAAAQ32N42A . You are receiving this because you were mentioned.Message ID: @.***>

bmfmancini avatar Oct 08 '22 16:10 bmfmancini

What RRDtool version?

TheWitness avatar Oct 10 '22 09:10 TheWitness

I have another theory...

TheWitness avatar Oct 10 '22 09:10 TheWitness

However, you need to answer the poller items question for a few of the cases.

TheWitness avatar Oct 10 '22 09:10 TheWitness

Rrd version is 1.4

On Mon., Oct. 10, 2022, 5:55 a.m. TheWitness, @.***> wrote:

However, you need to answer the poller items question for a few of the cases.

— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4941#issuecomment-1273065252, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTGAQXGZ57FOTJDIRGDWCPR2PANCNFSM6AAAAAAQ32N42A . You are receiving this because you were mentioned.Message ID: @.***>

bmfmancini avatar Oct 10 '22 11:10 bmfmancini

Upgrade to 1.8

TheWitness avatar Oct 10 '22 11:10 TheWitness

Ok updated to rrdtool 1.8

RRDtool 1.8.0  Copyright by Tobias Oetiker <[email protected]>
               Compiled Oct 11 2022 11:19:31

Gaps are still being seen after viewing a graph the data for that time period is removed from the poller_output_boost table sometimes the rra is updated without problem but others the graph will show a large gap

while checking for data the poller_output_boost table will have entries in it for that data source and they will disappear from the table while the graph still shows a gap however on the next boost run the graph will start to plot again but only with the data that populated in the table since its been viewed

Here are my steps

1.) View poller_output_boost table

MariaDB [cacti]> select * from poller_output_boost where local_data_id = '67278' \G
*************************** 1. row ***************************
local_data_id: 67278
     rrd_name: discards_in
         time: 2022-10-11 13:18:02
       output: 0
*************************** 2. row ***************************
local_data_id: 67278
     rrd_name: discards_out
         time: 2022-10-11 13:18:02
       output: 0
*************************** 3. row ***************************
local_data_id: 67278
     rrd_name: errors_in
         time: 2022-10-11 13:18:02
       output: 0
*************************** 4. row ***************************
local_data_id: 67278
     rrd_name: errors_out
         time: 2022-10-11 13:18:02
       output: 0
4 rows in set (0.001 sec)

2.) View the graph

image

3.) Check poller_output_boost table entries will be removed for the timespan you are viewing except for new polled data

MariaDB [cacti]> select * from poller_output where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output_boost  where local_data_id = '67278';
Empty set (0.000 sec)

MariaDB [cacti]> select * from poller_output_boost  where local_data_id = '67278';
+---------------+--------------+---------------------+--------+
| local_data_id | rrd_name     | time                | output |
+---------------+--------------+---------------------+--------+
|         67278 | discards_in  | 2022-10-11 13:23:02 | 0      |
|         67278 | discards_out | 2022-10-11 13:23:02 | 0      |
|         67278 | errors_in    | 2022-10-11 13:23:02 | 0      |
|         67278 | errors_out   | 2022-10-11 13:23:02 | 0      |
+---------------+--------------+---------------------+--------+
4 rows in set (0.001 sec)

Graph will still show the gap until boost run but only the newly polled data will make it to the rra the other data will be lost

image

bmfmancini avatar Oct 11 '22 17:10 bmfmancini

oops forgot to add the poller count for this example device is 12

bmfmancini avatar Oct 11 '22 18:10 bmfmancini

@TheWitness is the above info all what you were looking for ?

bmfmancini avatar Oct 13 '22 18:10 bmfmancini

Okay, bug was confirmed and this is resolved now.

TheWitness avatar Oct 16 '22 17:10 TheWitness

This is still broken when boost is running. Looking to get a fix together.

TheWitness avatar Oct 17 '22 20:10 TheWitness

@bmfmancini I'm going to mark this resolved. If after updating tomorrow to the latest in test, you find issues, we can re-open.

TheWitness avatar Oct 17 '22 22:10 TheWitness