percona-monitoring-plugins icon indicating copy to clipboard operation
percona-monitoring-plugins copied to clipboard

pmp-check-mongo.py: Could not connect or exec 'isMaster' command: 'No servers found yet'

Open otterblitzar opened this issue 4 years ago • 0 comments

I'm running a cronjob every 15 minutes that executes the pmp-check-mongo.py Nagios plugin to check a dozen MongoDB servers. I'm seeing intermittent errors like the following:

CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

Here's a sample of errors over several days, including timestamps:

---- MONDAY ----

19:30
Check if the shards are balanced...
  dcamongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

23:45
Check if there was a recent election...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

---- TUESDAY ----

03:45
Check connection...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

05:15
Check that the cluster has a primary server...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

07:00
Check that the cluster has a primary server...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

09:00
Check if the shards are balanced...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

09:30
Check if there was a recent election...
  dcbmongodb2 failed | msg: non-zero return code | stdout: CRITICAL - Could not connect or exec 'isMaster' command: 'No servers found yet'

I have no reason to suspect there is anything wrong with the MongoDB servers themselves. The errors go away if I add some exception handling and retry logic; see #112 for the code that fixes the problem.

The revision of pmp-check-mongo.py I was testing with is ca16cdcee59e3494f4a5c3031c50c34e5f78f766.

The server running the cronjob is CentOS 7.7 using the system /usr/bin/python. The pymongo library is version 3.7.2, and was provided by the CentOS repository:

$ rpm -qi python2-pymongo
Name        : python2-pymongo
Version     : 3.7.2
Release     : 1.el7
Architecture: x86_64
Install Date: Fri 17 Jul 2020 12:22:26 PM EDT
Group       : Unspecified
Size        : 1852358
License     : ASL 2.0 and MIT
Signature   : (none)
Source RPM  : python-pymongo-3.7.2-1.el7.src.rpm
Build Date  : Wed 06 Mar 2019 12:23:09 AM EST
Build Host  : c1bk.rdu2.centos.org
Relocations : (not relocatable)
Packager    : CBS <[email protected]>
Vendor      : CentOS

otterblitzar avatar Oct 13 '20 18:10 otterblitzar