snmp_exporter Add cache client to improve SNMP scraping

I add the ability to return from Cache for duplicate requests. If multiple modules are specified and duplicate Metrics are defined, this will not work well. It may also not be cached if more than one Worker is running.

generator.yml

modules:
  if_mib:
    walk: [ifOperStatus]
    lookups:
      - source_indexes: [ifIndex]
        lookup: ifAlias
      - source_indexes: [ifIndex]
        lookup: 1.3.6.1.2.1.2.2.1.2 # ifDescr
      - source_indexes: [ifIndex]
        lookup: 1.3.6.1.2.1.31.1.1.1.1 # ifName
  if_mib2:
    walk: [ifAdminStatus]
    lookups:
      - source_indexes: [ifIndex]
        lookup: ifAlias
      - source_indexes: [ifIndex]
        lookup: 1.3.6.1.2.1.2.2.1.2 # ifDescr
      - source_indexes: [ifIndex]
        lookup: 1.3.6.1.2.1.31.1.1.1.1 # ifName

testing

It can be seen that the snmp_scrape_packets_sent is decreasing.

> curl 'localhost:9116/snmp?target=192.168.64.3&module=if_mib,if_mib2'
# HELP ifAdminStatus The desired state of the interface - 1.3.6.1.2.1.2.2.1.7
# TYPE ifAdminStatus gauge
ifAdminStatus{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 1
ifAdminStatus{ifAlias="lo",ifDescr="lo",ifIndex="1",ifName="lo"} 1
ifAdminStatus{ifAlias="swp1",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="3",ifName="swp1"} 2
# HELP ifOperStatus The current operational state of the interface - 1.3.6.1.2.1.2.2.1.8
# TYPE ifOperStatus gauge
ifOperStatus{ifAlias="eth0",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="2",ifName="eth0"} 1
ifOperStatus{ifAlias="lo",ifDescr="lo",ifIndex="1",ifName="lo"} 1
ifOperStatus{ifAlias="swp1",ifDescr="Intel Corporation 82540EM Gigabit Ethernet Controller",ifIndex="3",ifName="swp1"} 2
# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds{module="if_mib"} 0.030786458
snmp_scrape_duration_seconds{module="if_mib2"} 0.002798083
# HELP snmp_scrape_packets_retried Packets retried for get, bulkget, and walk.
# TYPE snmp_scrape_packets_retried gauge
snmp_scrape_packets_retried{module="if_mib"} 0
snmp_scrape_packets_retried{module="if_mib2"} 0
# HELP snmp_scrape_packets_sent Packets sent for get, bulkget, and walk; including retries.
# TYPE snmp_scrape_packets_sent gauge
snmp_scrape_packets_sent{module="if_mib"} 4
snmp_scrape_packets_sent{module="if_mib2"} 1
# HELP snmp_scrape_pdus_returned PDUs returned from get, bulkget, and walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned{module="if_mib"} 12
snmp_scrape_pdus_returned{module="if_mib2"} 12
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds{module="if_mib"} 0.030733375
snmp_scrape_walk_duration_seconds{module="if_mib2"} 0.002757625

Feb 21 '24 13:02 servak

While I like the idea of simplicity, I think we need to have a more robust cache design before we try and implement this.

Feb 21 '24 14:02 SuperQ

OK. Certainly, the following problems could be solved by providing some kind of cash service.

It may also not be cached if more than one Worker is running.

What about the following Cache components?

type CacheService interface {
	Get(string) []gosnmp.SnmpPDU
	Set(string, []gosnmp.SnmpPDU)
}

type cacheClient struct {
	scraper     SNMPScraper
	cache.      CacheService
}

Memory is a given as an implementation, but if the assumption is that Cache will cover a wider area, then memcache or redis might be a good idea.

Feb 22 '24 05:02 servak

Caching should also have a TTL specification, which would be specified per module and probably per walk/get within each module.

I think we need to expand and write a bit of a design doc for what want for cache behaviors.

Feb 22 '24 08:02 SuperQ

I decided to implement this feature because I thought it would be good to save waste at one scrape, but if you want greater functionality from a cache service, we should certainly design it properly, so I'm going to close this PR once and for all.

Feb 23 '24 06:02 servak