[1.2.22] - When remote poller is in offline mode GUI inaccesible and poller times out
We have been performing testing on 1.2.22
During failure testing we found that if you fail the master poller the remote poller GUI is unusable and spine times out
spine spends a lot of time spawning script server processes this behaviour is only seen in offline mode
2022-08-18 14:23:37 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[1] PHP Script Server Child FORK Success
2022-08-18 14:23:55 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[1] Confirmed PHP Script Server running using readfd[17], writefd[16]
2022-08-18 14:23:55 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[2] PHP Script Server Routine Starting
2022-08-18 14:23:55 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[2] PHP Script Server About to FORK Child Process
2022-08-18 14:23:55 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[2] PHP Script Server Child FORK Success
2022-08-18 14:24:13 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[2] Confirmed PHP Script Server running using readfd[19], writefd[18]
2022-08-18 14:24:13 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[3] PHP Script Server Routine Starting
2022-08-18 14:24:13 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[3] PHP Script Server About to FORK Child Process
2022-08-18 14:24:13 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[3] PHP Script Server Child FORK Success
2022-08-18 14:24:31 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[3] Confirmed PHP Script Server running using readfd[21], writefd[20]
2022-08-18 14:24:31 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[4] PHP Script Server Routine Starting
2022-08-18 14:24:31 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[4] PHP Script Server About to FORK Child Process
2022-08-18 14:24:31 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[4] PHP Script Server Child FORK Success
2022-08-18 14:24:49 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[4] Confirmed PHP Script Server running using readfd[23], writefd[22]
2022-08-18 14:24:49 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[5] PHP Script Server Routine Starting
2022-08-18 14:24:49 - SPINE: Poller[1] PID[23532] PT[139764272642176] DEBUG: SS[5] PHP Script Server About to FORK Child Process
I checked via packet capture for excessive retries to the primary poller but did not find any
In the logs I am seeing the remote trying to push data to main constantly
2022-08-18 14:37:36 - CMDPHP SQL Backtrace: (/poller.php[759]:poller_push_data_to_main(), /lib/poller.php[1957]:poller_push_table(), /lib/poller.php[1995]:db_execute(), /lib/database.php[272]:db_execute_prepared())
2022-08-18 14:37:36 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2022-08-18 14:37:36 - CMDPHP SQL Backtrace: (/poller.php[759]:poller_push_data_to_main(), /lib/poller.php[1957]:poller_push_table(), /lib/poller.php[1995]:db_execute(), /lib/database.php[272]:db_execute_prepared())
2022-08-18 14:37:36 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2022-08-18 14:37:36 - CMDPHP SQL Backtrace: (/poller.php[759]:poller_push_data_to_main(), /lib/poller.php[1957]:poller_push_table(), /lib/poller.php[1995]:db_execute(), /lib/database.php[272]:db_execute_prepared())
2022-08-18 14:37:36 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2022-08-18 14:37:36 - CMDPHP SQL Backtrace: (/poller.php[759]:poller_push_data_to_main(), /lib/poller.php[1946]:poller_push_table(), /lib/poller.php[1995]:db_execute(), /lib/database.php[272]:db_execute_prepared())
2022-08-18 14:37:36 - CMDPHP ERROR: A DB Exec Failed!, Error: MySQL server has gone away
2022-08-18 14:37:36 - CMDPHP SQL Backtrace: (/poller.php[729]:db_fetch_cell_prepared(), /lib/database.php[484]:db_execute_prepared())
2022-08-18 14:37:36 - CMDPHP ERROR: A DB Cell Failed!, Error: MySQL server has gone away
You are running out of connections.
Let me check I didnt see any too many connections logs
connections are fine
MariaDB [cacti]> show status like '%max_used%';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| Max_used_connections | 242 |
+----------------------+-------+
1 row in set (0.00 sec)
MariaDB [cacti]> show variables where variable_name = 'max_connections';
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 5000 |
+-----------------+-------+
1 row in set (0.00 sec)
Ok interesting find if you fail the MariaDB process on the main poller this issue is not seen the remote poller is accessible and polling completes without issues
when the primary server is not reachable when the network connection is down that's when the remote starts acting up the local GUI is not accessible and the logs show polling timing out but the boost table has valid rows
when the network is restored to the primary recovery does kick in and valid records are passed to the main's boost table
You should reduce your timeout and retry number. The default's are too high.
Yea I thought the same
I'll try that on Monday
Tested the script server timeout dropped it from the default 40 seconds to 20 no difference
Brought down retry for spine from 5 to 3 same result
I used the same process as you with 1 remote and can not repeat your problem. This might be a plugin issue. What plugins are you running?
I think it's only thold and syslog I'll check in the morning
On Tue., Aug. 23, 2022, 9:03 p.m. TheWitness, @.***> wrote:
I used the same process as you with 1 remote and can not repeat your problem. This might be a plugin issue. What plugins are you running?
— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4896#issuecomment-1225045215, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTB27CP2VBGQMCZKSSTV2VYHZANCNFSM566DOYMA . You are receiving this because you authored the thread.Message ID: @.***>
@TheWitness and I spoke about this to create a setting for main server timeout on the remote pollers using a PDO timeout
example
$DBH = new PDO(
"mysql:host=$host;dbname=$dbname",
$username,
$password,
array(
PDO::ATTR_TIMEOUT => 5, // in seconds
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION
)
);
The reason for this is the MySQL/MariaDB client take longer to timeout when the server does not respond if the primary DB is simply down but the server is up the response is connection refused which is immediate while the timeout will wait longer and attempt to retry
Also found the following
when the primary server is offline you will see this in the error_log of httpd
[Wed Aug 24 10:48:10.762636 2022] [php7:error] [pid 18633] [client :34869] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 394
[Wed Aug 24 12:23:30.729189 2022] [php7:error] [pid 7519] [client :38877] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.738659 2022] [php7:error] [pid 4446] [client :38763] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.748958 2022] [php7:error] [pid 6597] [client :38782] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.756671 2022] [php7:error] [pid 7965] [client :38736] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.769023 2022] [php7:error] [pid 10217] [client :38753] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.878296 2022] [php7:error] [pid 1652] [client :38732] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:30.990902 2022] [php7:error] [pid 4440] [client :38948] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
[Wed Aug 24 12:23:31.041406 2022] [php7:error] [pid 1425] [client :38745] PHP Fatal error: Allowed memory size of 576716800 bytes exhausted (tried to allocate 262144 bytes) in /var/www/html/cacti/lib/database.php on line 548
Function at line 548
function db_fetch_row_prepared($sql, $params = array(), $log = true, $db_conn = false) {
global $config;
if (!empty($config['DEBUG_SQL_FLOW'])) {
db_echo_sql('db_fetch_row_prepared(\'' . clean_up_lines($sql) . '\', $params = (\'' . implode('\', \'', $params) . '\'), $log = ' . $log . ', $db_conn = ' . ($db_conn ? 'true' : 'false') .')' . "\n");
}
return db_execute_prepared($sql, $params, $log, $db_conn, 'Row', false, 'db_fetch_row_return');
}
I think we need to have a setting in the GUI to keep the poller offline from the GUI perspective until it comes back up. Something that we need to save in the session. I think this is the only good way to solve this issue.
It's likely I'll be offline for a few days. We'll see.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump to keep the ticket open
On Mon., Oct. 24, 2022, 8:05 p.m. github-actions[bot], < @.***> wrote:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
— Reply to this email directly, view it on GitHub https://github.com/Cacti/cacti/issues/4896#issuecomment-1289798591, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADGEXTEEL6WUXNIAYFP45WLWE4P3JANCNFSM566DOYMA . You are receiving this because you authored the thread.Message ID: @.***>