snmpcollector icon indicating copy to clipboard operation
snmpcollector copied to clipboard

Cisco NCS Polling

Open pmawsonau opened this issue 3 years ago • 4 comments

We are running snmpcollector 0.8.0 (yes we need to upgrade) to poll a number of devices. Our network is made up of Juniper and Cisco.

We are having issues in particular with Cisco NCS. Every 6 hours (which lines up with the full walk) it fails to poll for up to 20minutes. The logs show this

time="2021-05-04 01:25:00" level=info msg="SNMPDEVICE [SNIP] Init gather cycle mode Concurrent [ true ]" time="2021-05-04 01:25:00" level=info msg="MEASUREMENT [interface_counters] Not Oid CONDITIONEVAL metrics exist on this measurement" time="2021-05-04 01:25:00" level=info msg="MEASUREMENT [interface_counters] Not EVAL metrics exist on this measurement" time="2021-05-04 01:25:00" level=info msg="STATS SNMP GET: snmp polling took [0.911181 seconds] SNMP: Gets [650] , Processed [340], Errors [0]" time="2021-05-04 01:25:00" level=info msg="STATS SNMP FILTER: filter polling took [0.000000 seconds] " time="2021-05-04 01:25:00" level=info msg="STATS INFLUX: influx send took [0.000008 seconds]" time="2021-05-04 01:30:00" level=info msg="SNMPDEVICE [SNIP] Init gather cycle mode Concurrent [ true ]" time="2021-05-04 01:30:00" level=info msg="MEASUREMENT [interface_counters] Not Oid CONDITIONEVAL metrics exist on this measurement" time="2021-05-04 01:30:00" level=info msg="MEASUREMENT [interface_counters] Not EVAL metrics exist on this measurement" time="2021-05-04 01:30:00" level=info msg="STATS SNMP GET: snmp polling took [0.914380 seconds] SNMP: Gets [650] , Processed [340], Errors [0]" time="2021-05-04 01:30:00" level=info msg="STATS SNMP FILTER: filter polling took [0.000000 seconds] " time="2021-05-04 01:30:00" level=info msg="STATS INFLUX: influx send took [0.000003 seconds]" time="2021-05-04 01:35:00" level=info msg="SNMPDEVICE [SNIP] Init gather cycle mode Concurrent [ true ]" time="2021-05-04 01:35:01" level=info msg="MEASUREMENT [interface_counters] Not Oid CONDITIONEVAL metrics exist on this measurement" time="2021-05-04 01:35:01" level=info msg="MEASUREMENT [interface_counters] Not EVAL metrics exist on this measurement" time="2021-05-04 01:35:01" level=info msg="STATS SNMP GET: snmp polling took [1.010677 seconds] SNMP: Gets [650] , Processed [340], Errors [0]" time="2021-05-04 01:35:01" level=info msg="STATS SNMP FILTER: filter polling took [0.000000 seconds] " time="2021-05-04 01:35:01" level=info msg="STATS INFLUX: influx send took [0.000002 seconds]" time="2021-05-04 01:40:00" level=info msg="SNMPDEVICE [SNIP] Init gather cycle mode Concurrent [ true ]" time="2021-05-04 01:40:30" level=error msg="MEASUREMENT [interface_counters] SNMP WALK (103.1.X.X) for OID (.1.3.6.1.2.1.31.1.1.1.6) get error: Request timeout (after 1 retries) time="2021-05-04 01:41:00" level=error msg="MEASUREMENT [interface_counters] SNMP WALK (103.1.X.X) for OID (.1.3.6.1.2.1.31.1.1.1.10) get error: Request timeout (after 1 retries) time="2021-05-04 01:41:30" level=error msg="MEASUREMENT [interface_counters] SNMP WALK (103.1.X.X) for OID (.1.3.6.1.2.1.31.1.1.1.18) get error: Request timeout (after 1 retries) time="2021-05-04 01:42:00" level=error msg="MEASUREMENT [interface_counters] SNMP WALK (103.1.X.X) for OID (.1.3.6.1.2.1.31.1.1.1.15) get error: Request timeout (after 1 retries) time="2021-05-04 01:42:30" level=error msg="MEASUREMENT [interface_counters] SNMP WALK (103.1.X.X) for OID (.1.3.6.1.2.1.2.2.1.8) get error: Request timeout (after 1 retries) time="2021-05-04 01:42:30" level=info msg="MEASUREMENT [interface_counters] Not Oid CONDITIONEVAL metrics exist on this measurement" time="2021-05-04 01:42:30" level=info msg="MEASUREMENT [interface_counters] Not EVAL metrics exist on this measurement" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCOutOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[hostname:snip ifName:HundredGigE0/0/0/19]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252090 Valid:false CookedValue:238289135 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.4169110 17 LastTime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.10.82 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifAlias] from MEASUREMENT[ interface_counters ] with TAGS [map[ifName:HundredGigE0/0/0/19 hostname:snip]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002523f0 Valid:false CookedValue:<NOMON> LINK | SNIP | et-0/0/0 | 0 CurValue: LastValue: CurTime:2021-05- 04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x9477f0 RealOID:.1.3.6.1.2.1.31.1.1.1.18.82 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHighSpeed] from MEASUREMENT[ interface_counters ] with TAGS [map[DC:SNIP STATE:NT ifName:HundredGigE0/0/0/19 snip AS:9669]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252240 Valid:false CookedValue:100000 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 La stTime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.15.82 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifOperStatus] from MEASUREMENT[ interface_counters ] with TAGS [map[STATE:NT ifName:HundredGigE0/0/0/19 snip ]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002522d0 Valid:false CookedValue:1 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTi me:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.2.2.1.8.82 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCInOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[STATE:NT ifName:HundredGigE0/0/0/19 snip ]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252000 Valid:false CookedValue:54151875 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.6.82 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="MEASUREMENT [interface_counters] error in influx point creation :point without fields is unsupported" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCInOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[hostname:snip ifName:TenGigE0/0/0/41/3]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252000 Valid:false CookedValue:0 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime :0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.6.128 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCOutOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[hostname:snip ifName:TenGigE0/0/0/41/3]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252090 Valid:false CookedValue:0 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTim e:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.10.128 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifAlias] from MEASUREMENT[ interface_counters ] with TAGS [map[hostname:snip ifName:TenGigE0/0/0/41/3]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002523f0 Valid:false CookedValue: CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime:0001- 01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x9477f0 RealOID:.1.3.6.1.2.1.31.1.1.1.18.128 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHighSpeed] from MEASUREMENT[ interface_counters ] with TAGS [map[ STATE:NT ifName:TenGigE0/0/0/41/3 snip]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252240 Valid:false CookedValue:10000 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastT ime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.15.128 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifOperStatus] from MEASUREMENT[ interface_counters ] with TAGS [map[hostname:snip ifName:TenGigE0/0/0/41/3]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002522d0 Valid:false CookedValue:2 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime :0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.2.2.1.8.128 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="MEASUREMENT [interface_counters] error in influx point creation :point without fields is unsupported" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCInOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[DC:SNIP ifName:HundredGigE0/0/0/40 STATE:NT snip AS:9669]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252000 Valid:false CookedValue:0 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTi me:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.6.61 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCOutOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[STATE:NT snip ifName:HundredGigE0/0/0/40]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252090 Valid:false CookedValue:0 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastT ime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.10.61 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifAlias] from MEASUREMENT[ interface_counters ] with TAGS [map[ ifName:HundredGigE0/0/0/40 STATE:NT snip]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002523f0 Valid:false CookedValue: CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTime:000 1-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x9477f0 RealOID:.1.3.6.1.2.1.31.1.1.1.18.61 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHighSpeed] from MEASUREMENT[ interface_counters ] with TAGS [map[ ifName:HundredGigE0/0/0/40 STATE:NT snip]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252240 Valid:false CookedValue:100000 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 La stTime:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.31.1.1.1.15.61 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifOperStatus] from MEASUREMENT[ interface_counters ] with TAGS [map[snip ifName:HundredGigE0/0/0/40 STATE:NT]] has obsolete data => See Metric Runtime [ &{cfg:0xc0002522d0 Valid:false CookedValue:2 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTi me:0001-01-01 00:00:00 +0000 UTC ElapsedTime:0 Compute: Scale:0x9452f0 Convert:0x948ab0 SetRawData:0x945570 RealOID:.1.3.6.1.2.1.2.2.1.8.61 Report:1 re: mm:[] expr: condflt: log:0xc0003a43c0} ]" time="2021-05-04 01:42:30" level=warning msg="MEASUREMENT [interface_counters] error in influx point creation :point without fields is unsupported" time="2021-05-04 01:42:30" level=warning msg="Warning METRIC ID [ifHCInOctets] from MEASUREMENT[ interface_counters ] with TAGS [map[STATE:NT snip ifName:HundredGigE0/0/0/45]] has obsolete data => See Metric Runtime [ &{cfg:0xc000252000 Valid:false CookedValue:0 CurValue: LastValue: CurTime:2021-05-04 01:35:00.001019441 +0000 UTC m=+167157.416911017 LastTi <SNIP>

We have tweaked get bulk and fiddled with Max repetitions between 30 and 100, and even disabled it with no changes.

Is this a poller or network device issue

Thanks

pmawsonau avatar May 04 '21 06:05 pmawsonau

Hi @pmawsonau , apologies for the late answer

As a first sight, it doesn't seem to be related with the SNMPCollector as itself, but we need the following information:

  • Total number of devices vs Cisco NCS devices
  • Gather frequency and Filter Frequency cycles (SNMP Device configuration)
  • Timeout/Retries configured on Cisco NCS devices
  • Number of measurements (indexed ones) and filters attached to the Cisco NCS devices
  • Avg. Gather time when the polls are succeded

sbengo avatar May 07 '21 06:05 sbengo

Hello @pmawsonau

If there are not any update in next hours we will close this issue. Thank you.

toni-moreno avatar Oct 13 '21 20:10 toni-moreno

Hi, Apologies for delay. This issue is across all our collectors (we have a number), but for this example I will look at syd01 only Total Devices: 137 NCSs included in total: 15 Device Setting Timeout: 60 Device Setting Retries: 2 Polling Settings Freq: 300 Polling Settings UpdateFitFreq: 60

For one NCS that has the issues, logfile

level=info msg="STATS SNMP GET: snmp polling took [0.351648 seconds] SNMP: Gets [662] , Processed [357], Errors [0]" level=info msg="STATS SNMP FILTER: filter polling took [0.000000 seconds] " level=info msg="STATS INFLUX: influx send took [0.000004 seconds]"

pmawsonau avatar Oct 13 '21 21:10 pmawsonau

Hi @pmawsonau

According to your configuration, the measurement index (mesurement walks + filters) will be each 5 hours, not 6

Anyway, we do not discard any SNMPCollector issue, so, please:

  • Update to 0.11 version (we have updated the core SNMP lib and other improvements)
  • Try to create a new instance with only one Cisco NCS and set the UpdateFiltFreq to lower value (i.e -6) and check if the timeout period fits into the "full walk" (i.e - every 30 minutes)

Thanks, Regards

sbengo avatar Oct 22 '21 04:10 sbengo