grafana-dashboards icon indicating copy to clipboard operation
grafana-dashboards copied to clipboard

Interface Traffic and Interface Traffic Distribution not populating

Open sjabby opened this issue 7 years ago • 16 comments

I've followed all of the instructions and the dashboard is almsot working 100%, but i cant get the graphs for Interface Traffic and Interface Traffic Distribution populate.

Any input? @WaterByWind

sjabby avatar Jun 04 '17 18:06 sjabby

Which dashboard are you using? (UAP, EdgeRouter, ??)

If using the edgerouter dashboard, be sure to be using at least 1.3.0 version of Telegraf as previous versions do not include the required support for collecting some data. (This does not apply to the UAP dashboard)

WaterByWind avatar Jun 07 '17 15:06 WaterByWind

I have the same problem with the edgerouter dashboard. I am running telegraf version 1.3.1. none of the Interface or IP charts are populated. My UAP dashboard works fine.

The required data seems to have been captured because a query like "select ifName,agent_host,ifHCInOctets, ifHCOutOctets from "snmp.EdgeOS" where time > now() - 1d and ifName = 'eth0'" does return a nice time series of Octet counts.

I can provide a backup of my db if that would help.

uhede avatar Jun 08 '17 19:06 uhede

Found my problem. I had a name_override setting in my telegraf.conf file that placed all observations in the same measurement. Removing that setting make the dashboard run fine.

uhede avatar Jun 11 '17 15:06 uhede

@uhede - Glad you found your issue and thanks for noting it here. @sjabby - Are you still having a problem? I'll keep this open just in case.

WaterByWind avatar Jun 21 '17 19:06 WaterByWind

@WaterByWind , sorry for the delay.

I'm using the EdgeRouter ERPOE-5(FW 1.9.0) and running grafana, telegraf etc on a RPi3.

Telegraf version is 1.3.1

sjabby avatar Jun 24 '17 19:06 sjabby

Tried running telegraf and capturing more log data:

2017-08-03T20:25:24Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: performing get on field ssCpuRawNice: Request timeout (after 3 retries) 2017-08-03T20:25:50Z E! Error in plugin [inputs.snmp]: took longer to collect than collection interval (1m0s) 2017-08-03T20:25:53Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifHighSpeed: Request timeout (after 3 retries) 2017-08-03T20:26:03Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ipSystemStatsTable: performing bulk walk for field ipSystemStatsInReceives: Request timeout (after 3 retries) 2017-08-03T20:26:36Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifTable: performing bulk walk for field ifInUcastPkts: Request timeout (after 3 retries) 2017-08-03T20:27:01Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifLinkUpDownTrapEnable: Request timeout (after 3 retries) 2017-08-03T20:27:07Z E! Error in plugin [inputs.snmp]: took longer to collect than collection interval (1m0s) 2017-08-03T20:27:35Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifTable: performing bulk walk for field ifInOctets: Request timeout (after 3 retries) 2017-08-03T20:27:59Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifLinkUpDownTrapEnable: Request timeout (after 3 retries) 2017-08-03T20:28:33Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifHCOutUcastPkts: Request timeout (after 3 retries) 2017-08-03T20:29:11Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: performing get on field sysUpTime: Request timeout (after 3 retries) 2017-08-03T20:29:30Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifLinkUpDownTrapEnable: Request timeout (after 3 retries) 2017-08-03T20:29:40Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ipSystemStatsTable: performing bulk walk for field ipSystemStatsInReceives: Request timeout (after 3 retries) 2017-08-03T20:30:10Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: performing get on field sysUpTime: Request timeout (after 3 retries) 2017-08-03T20:30:29Z E! Error in plugin [inputs.snmp]: agent 192.168.2.1: gathering table ifXTable: performing bulk walk for field ifLinkUpDownTrapEnable: Request timeout (after 3 retries)

Now running ERPOE-5 (FW 1.9.7) and Telegraf 1.3.5

sjabby avatar Aug 03 '17 20:08 sjabby

@sjabby are you able to do an snmpwalk by hand to get any data? It looks like the ER may not be responding to SNMP requests at all.

For instance, does this work: snmpwalk -v 2c -c <your_community_string> <your_edgerouter> sysORID

WaterByWind avatar Aug 09 '17 23:08 WaterByWind

IF-MIB::ifXTable is the only SNMP request that regularly times out for me. I had to increase the timeout in the example config from 5s to 10s.

jjlawren avatar Aug 14 '17 19:08 jjlawren

I have an EdgeRouter Lite 3 running EdgeOS v1.9.7. I had this same problem and fixed it by commenting out the high-capacity (HC) counters section:

   ##
   ## Interface metrics
   ##
   #  Per-interface traffic, errors, drops
   [[inputs.snmp.table]]
     oid = "IF-MIB::ifTable"
     [[inputs.snmp.table.field]]
       oid = "IF-MIB::ifName"
       is_tag = true
   #  Per-interface high-capacity (HC) counters
   #[[inputs.snmp.table]]
   #  oid = "IF-MIB::ifXTable"
   #  [[inputs.snmp.table.field]]
   #    oid = "IF-MIB::ifName"
   #    is_tag = true

IF-MIB::ifXTable doesn't seem to be a valid oid for my setup, which is why that failed.

Awesome work though, thanks!

woody3000 avatar Sep 04 '17 18:09 woody3000

I solved my problem by setting the timeout to 10s as suggested by @jjlawren

sjabby avatar Sep 05 '17 22:09 sjabby

@woody3000 - What behavior did you see to make you think IF-MIB::ifXTable is not valid? What errors did you see?

This is a standard object that has been populated by EdgeOS for a long time. This OID is still valid even with the latest hot fix releases and betas so it would be odd that your instance would not have this.

If you did see a message such as 'no such object' or 'no such instance' then a little more investigation may be helpful.

It is possible that the ER is just taking too long to respond and increasing the timeout as suggested earlier may address that.

If you remove the ifXTable from collection then you'd also need to update the graphs to use the standard counters instead of the HC counters.

WaterByWind avatar Sep 06 '17 02:09 WaterByWind

From my gafana host I tried running: snmpwalk -v 2c -c EDGEOS <ip> "IF-MIB::ifXName" and I get: IF-MIB::ifXName: Unknown Object Identifier

It's entirely possible I don't have a MIB or something. It's running on Ubuntu 16.04 and to get the rest of the MIBs working, I only ran: sudo apt-get install snmp-mibs-downloader sudo download-mibs

woody3000 avatar Sep 06 '17 02:09 woody3000

No there indeed is no IF-MIB::ifXName, but that is not used above.

The table is IF-MIB::ifXTable and a tag is added (IF-MIB::ifName with no 'X') to provide a direct correlation with IF-MIB::ifTable

WaterByWind avatar Sep 06 '17 04:09 WaterByWind

I just dropped by to add a 'me too' for the imeout setting. When I added the configs to my telegraf, I got timeout errors in my logs all the time for the IF-MIB::ifXTable and once I bumped the timeout from 5 to 15s things started working.

Edgerouter lite with FW 1.9, telegraf 1.4.4 running in a FreeBSD 11.1-RELEASE jail

Error I got in the logs: 2018-02-07T13:27:36Z E! Error in plugin [inputs.snmp]: agent 192.168.10.250: gathering table ifXTable: performing bulk walk for field ifAlias: Request timeout (after 3 retries)

mvanbaak avatar Feb 07 '18 13:02 mvanbaak

Also no population of the ifXTable with an Edgerouter X software version 2.09hotfix.

On Interface traffic I do get data after changing the ifXTable to ifTable but with the Interface Traffic Distribution no such thing. Setting the timeout on SMTP from 5s to 15s did nothing for me.

Too bad it does not work with all parts, otherwise a perfect dashboard !

pixelmagic66 avatar Oct 01 '22 18:10 pixelmagic66

ifTable has only 32 bit counters which can roll over fairly quickly on a busy interface. ifXTable uses 64 bit counters instead.

Which 2.0.9-hotfix version specifically (there were multiple updates)? For Cavium-based ERs there is no issue with tables with 2.0.9-hotfix.4. I don't have an ER-X available at the moment to test with, but this would be an issue with the SNMP on the ER if this does not work. It is not possible to collect and display data that is not provided by the ER-X itself, but if the tables (which are standard) are not available then something would appear broken.

The timeout setting can be removed with more recent versions of telegraf as the defaults work well, unlike earlier versions.

WaterByWind avatar Nov 11 '22 04:11 WaterByWind