easysnmp icon indicating copy to clipboard operation
easysnmp copied to clipboard

EasySNMPTimeoutError if not first session

Open Guens opened this issue 2 years ago • 15 comments

I have a very strange behavior (bug ?). If I run this simple script

import easysnmp

snmp_opts = {
    "version": 3,
    "security_level": "auth_with_privacy", 
    "auth_protocol": "MD5",  
    "privacy_protocol": "DES",

    "auth_password": "xxx",  
    "privacy_password": "xxx",  
    "security_username": "xxx", 
}

oid = [".1.3.6.1.2.1.1.3.0", ]


s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
print(s2.get(oid))

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
print(s1.get(oid))

It works

[<SNMPVariable value='25134890' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
[<SNMPVariable value='24988592' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]

But If I run s1 before s2

...
s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
print(s1.get(oid))

s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
print(s2.get(oid))

the second session fails :

[<SNMPVariable value='25003713' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
Traceback (most recent call last):
  File "/home/pg218277/.config/JetBrains/PyCharmCE2021.3/scratches/bug_easysnmp.py", line 27, in <module>
    print(s2.get(oid))
  File "/home/pg218277/git/equtils/venv/lib64/python3.6/site-packages/easysnmp/session.py", line 315, in get
    interface.get(self, varlist)
easysnmp.exceptions.EasySNMPTimeoutError: timed out while connecting to remote host

It happens with

  • net-snmp 5.81 and easysnmp 0.2.5
  • net-snmp 5.91 and easysnmp latest master

I'm very annoyed because I didn't find any workaround. Can somebody help me ?

Guens avatar Apr 15 '22 08:04 Guens

Interesting issue. What happens if you create the sessions and then loop ten times? Do they all fail?

For example:


s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)

for i in range(10):
    try:
        print(s1.get(oid))
    except Exception as e:
        print('S1 error on iteration {}: {}'.format(i, e))
    try:
        print(s2.get(oid))
    except Exception as e:
        print('S2 error on iteration {}: {}'.format(i, e))

kamakazikamikaze avatar Apr 15 '22 18:04 kamakazikamikaze

Thanks Kent for your help.

I tried your script and the second session always fails too :

[<SNMPVariable value='35292381' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 0: timed out while connecting to remote host
[<SNMPVariable value='35292582' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 1: timed out while connecting to remote host
[<SNMPVariable value='35292782' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 2: timed out while connecting to remote host
[<SNMPVariable value='35292983' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 3: timed out while connecting to remote host
[<SNMPVariable value='35293183' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 4: timed out while connecting to remote host
[<SNMPVariable value='35293384' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 5: timed out while connecting to remote host
[<SNMPVariable value='35293584' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 6: timed out while connecting to remote host
[<SNMPVariable value='35293784' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 7: timed out while connecting to remote host
[<SNMPVariable value='35293985' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 8: timed out while connecting to remote host
[<SNMPVariable value='35294185' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 9: timed out while connecting to remote host

If I try to invert S2 an S1 :

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)

for i in range(10):
    try:
        print(s2.get(oid))
    except Exception as e:
        print('S2 error on iteration {}: {}'.format(i, e))
    try:
        print(s1.get(oid))
    except Exception as e:
        print('S1 error on iteration {}: {}'.format(i, e))

the first S2 succeeds, but all other S2 sessions fail after the first S1 :

[<SNMPVariable value='35464538' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
[<SNMPVariable value='35318239' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 1: timed out while connecting to remote host
[<SNMPVariable value='35318439' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 2: timed out while connecting to remote host
[<SNMPVariable value='35318640' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 3: timed out while connecting to remote host
[<SNMPVariable value='35318840' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 4: timed out while connecting to remote host
[<SNMPVariable value='35319041' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 5: timed out while connecting to remote host
[<SNMPVariable value='35319241' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 6: timed out while connecting to remote host
[<SNMPVariable value='35319442' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 7: timed out while connecting to remote host
[<SNMPVariable value='35319642' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 8: timed out while connecting to remote host
[<SNMPVariable value='35319842' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]
S2 error on iteration 9: timed out while connecting to remote host
[<SNMPVariable value='35320043' (oid='sysUpTimeInstance', oid_index='', snmp_type='TICKS')>]

Il also have the same result if sessions are created inside the loop :

for i in range(10):
    try:
        s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
        print(s2.get(oid))
    except Exception as e:
        print('S2 error on iteration {}: {}'.format(i, e))
    try:
        s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
        print(s1.get(oid))
    except Exception as e:
        print('S1 error on iteration {}: {}'.format(i, e))

What's might be happening ?

Guens avatar Apr 16 '22 13:04 Guens

I'm honestly not completely certain, but my best guess is that something internal may not be parsing the data properly and causes an (inaccurate) exception to be made.

Can you run Wireshark and see if the packets leave your station and a response comes back? If there is, then we'll need to get down into the C code to figure out where it's erroring out.

kamakazikamikaze avatar Apr 16 '22 14:04 kamakazikamikaze

I follow your advice with wireshark.

When it's OK with

s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
print(s2.get(oid))

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
print(s1.get(oid))

Hers is the wireshark capture

$ tshark -r ok.pcapng
    1 0.000000000  188.3.3.204 → 10.224.113.113 SNMP 106 get-request
    2 0.001631285 10.224.113.113 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    3 0.001699119  188.3.3.204 → 10.224.113.113 SNMP 186 encryptedPDU: privKey Unknown
    4 0.003590253 10.224.113.113 → 188.3.3.204  SNMP 194 encryptedPDU: privKey Unknown
    5 0.022075249  188.3.3.204 → 10.224.115.121 SNMP 106 get-request
    6 0.023651344 10.224.115.121 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    7 0.023723488  188.3.3.204 → 10.224.115.121 SNMP 186 encryptedPDU: privKey Unknown
    8 0.025355497 10.224.115.121 → 188.3.3.204  SNMP 174 report 1.3.6.1.6.3.15.1.1.2.0
    9 0.025402634  188.3.3.204 → 10.224.115.121 SNMP 186 encryptedPDU: privKey Unknown
   10 0.027161327 10.224.115.121 → 188.3.3.204  SNMP 194 encryptedPDU: privKey Unknown

And when it fails with

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
print(s1.get(oid))

s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
print(s2.get(oid))

Here is the capture

$ tshark -r ko.pcapng
    1 0.000000000  188.3.3.204 → 10.224.115.121 SNMP 106 get-request
    2 0.001586401 10.224.115.121 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    3 0.001660955  188.3.3.204 → 10.224.115.121 SNMP 186 encryptedPDU: privKey Unknown
    4 0.003383640 10.224.115.121 → 188.3.3.204  SNMP 194 encryptedPDU: privKey Unknown
    5 0.021496701  188.3.3.204 → 10.224.113.113 SNMP 106 get-request
    6 0.023240485 10.224.113.113 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    7 0.023307782  188.3.3.204 → 10.224.113.113 SNMP 186 encryptedPDU: privKey Unknown
    8 0.025056734 10.224.113.113 → 188.3.3.204  SNMP 174 report 1.3.6.1.6.3.15.1.1.2.0

It seems something is missing, isn't it ?

Guens avatar Apr 16 '22 17:04 Guens

Single sessions s1 or s2 contains only the following packets :

$ tshark -r s1.pcapng
    1 0.000000000  188.3.3.204 → 10.224.115.121 SNMP 106 get-request
    2 0.001558097 10.224.115.121 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    3 0.002624055  188.3.3.204 → 10.224.115.121 SNMP 186 encryptedPDU: privKey Unknown
    4 0.004359447 10.224.115.121 → 188.3.3.204  SNMP 194 encryptedPDU: privKey Unknown
$ tshark -r s2.pcapng
    1 0.000000000  188.3.3.204 → 10.224.113.113 SNMP 106 get-request
    2 0.001603048 10.224.113.113 → 188.3.3.204  SNMP 162 report 1.3.6.1.6.3.15.1.1.4.0
    3 0.007255996  188.3.3.204 → 10.224.113.113 SNMP 186 encryptedPDU: privKey Unknown
    4 0.009015309 10.224.113.113 → 188.3.3.204  SNMP 194 encryptedPDU: privKey Unknown

I don't understand why chaining s1 and s2, or s2 and s1, generates such packet 8 (and may be 7 ?) :

    7 0.023723488  188.3.3.204 → 10.224.115.121 SNMP 186 encryptedPDU: privKey Unknown
    8 0.025355497 10.224.115.121 → 188.3.3.204  SNMP 174 report 1.3.6.1.6.3.15.1.1.2.0

or

    7 0.023307782  188.3.3.204 → 10.224.113.113 SNMP 186 encryptedPDU: privKey Unknown
    8 0.025056734 10.224.113.113 → 188.3.3.204  SNMP 174 report 1.3.6.1.6.3.15.1.1.2.0

FYI same result with the new 0.2.6a1.

Guens avatar Apr 18 '22 08:04 Guens

I will try to replicate your scenario this week by querying two different hosts with SNMPv3. If that doesn't work, I'll see about creating a Docker image with debug symbols that will run scripts and output the full traceback of lines that get executed.

kamakazikamikaze avatar Apr 18 '22 12:04 kamakazikamikaze

I' m seeing the same issue on our system. Is there any way to properly close the "old" session before creating a new one ? If I understood it right, easysnmp uses the interface of netsnmp, so I wonder if there anything left what may cause it. I tried to solve the issue by selecting different local_ports for the sessions, but no luck.

amamitzsch avatar May 05 '22 15:05 amamitzsch

Looking into #156, it could be that there's something being cached by the library that gets improperly used between the devices. By chance, are the EngineID values returned by each of the devices the same?

kamakazikamikaze avatar Jul 15 '22 19:07 kamakazikamikaze

Good shot Ken !

Problem occurs when two devices have the same EngineID.

Let's look at this example :

oid = [".1.3.6.1.6.3.10.2.1.1.0", ]  # SNMP-FRAMEWORK-MIB::snmpEngineID.0

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
s3 = easysnmp.Session(hostname="10.224.117.121", **snmp_opts)

print(f"s1 = {s1.get(oid)}")
print(f"s2 = {s2.get(oid)}")
print(f"s3 = {s3.get(oid)}")

We can see s1 and s3 have the same EngineId, but not s2 :

s1 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s2 = [<SNMPVariable value='Ybb (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]

But if s3 and s1 are swaped

print(f"s3 = {s3.get(oid)}")
print(f"s2 = {s2.get(oid)}")
print(f"s1 = {s1.get(oid)}")

s1 query fails :

s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s2 = [<SNMPVariable value='Ybb (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
Traceback (most recent call last):
  File "/home/pg218277/.config/JetBrains/PyCharmCE2022.1/scratches/bug_easysnmp.py", line 32, in <module>
    print(f"s1 = {s1.get(oid)}")
  File "/home/pg218277/git/equtils/venv/lib64/python3.6/site-packages/easysnmp/session.py", line 298, in get
    interface.get(self, varlist)
easysnmp.exceptions.EasySNMPTimeoutError: timed out while connecting to remote host

Guens avatar Jul 17 '22 10:07 Guens

Problem occurs when two devices have the same EngineID.

Let's look at this example :

oid = [".1.3.6.1.6.3.10.2.1.1.0", ]  # SNMP-FRAMEWORK-MIB::snmpEngineID.0

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
s2 = easysnmp.Session(hostname="10.224.113.113", **snmp_opts)
s3 = easysnmp.Session(hostname="10.224.117.121", **snmp_opts)

print(f"s1 = {s1.get(oid)}")
print(f"s2 = {s2.get(oid)}")
print(f"s3 = {s3.get(oid)}")

We can see s1 and s3 have the same EngineId, but not s2 :

s1 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s2 = [<SNMPVariable value='Ybb (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]

But if s3 and s1 are swaped

print(f"s3 = {s3.get(oid)}")
print(f"s2 = {s2.get(oid)}")
print(f"s1 = {s1.get(oid)}")

s1 query fails :

s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s2 = [<SNMPVariable value='Ybb (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
Traceback (most recent call last):
  File "/home/pg218277/.config/JetBrains/PyCharmCE2022.1/scratches/bug_easysnmp.py", line 32, in <module>
    print(f"s1 = {s1.get(oid)}")
  File "/home/pg218277/git/equtils/venv/lib64/python3.6/site-packages/easysnmp/session.py", line 298, in get
    interface.get(self, varlist)
easysnmp.exceptions.EasySNMPTimeoutError: timed out while connecting to remote host

Ok, I've not been able to troubleshoot this yet, but two things come to mind:

  1. Every SNMP agent must have a unique Engine ID. It is unclear if this infers globally unique or per-device unique.
  2. Your captures from .113 and .121 report an OID I've not come across yet. 10.224.115.121 → 188.3.3.204 SNMP 174 report 1.3.6.1.6.3.15.1.1.2.0. OID reference. This leads me to believe that either Net-SNMP is using something cached because the Engine IDs match, or our instructions when we pass off to Net-SNMP are causing it to use something that is cached.

Are you able to change the Engine ID of the two devices so that they're unique from each other? If you can and it clears everything up, then my next steps will include adding testcases for when two engines run with the same ID and then diving back into our calls to Net-SNMP.

kamakazikamikaze avatar Jul 23 '22 19:07 kamakazikamikaze

RFC3411 section 3.1.1.1 states that the EngineID is unique across entities within an administrative domain. Your devices may be violating this if they're on the same network. It's odd that they both have the same ID, especially if each device uses something internal to auto-generated an ID. Are these the same host?

kamakazikamikaze avatar Jul 23 '22 20:07 kamakazikamikaze

These 2 devices are 2 different hosts. They both have exactly the same software and configuration. They only differ with their network addresses, of course.

I tried to restart one of them, and it got a different Engine ID :

oid = [".1.3.6.1.6.3.10.2.1.1.0", ]  # SNMP-FRAMEWORK-MIB::snmpEngineID.0

s1 = easysnmp.Session(hostname="10.224.115.121", **snmp_opts)
s3 = easysnmp.Session(hostname="10.224.117.121", **snmp_opts)

print(f"s1 = {s1.get(oid)}")
print(f"s3 = {s3.get(oid)}")

s1 = [<SNMPVariable value='a{b (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]

And now if I swap queries :

print(f"s3 = {s3.get(oid)}")
print(f"s1 = {s1.get(oid)}")

Now it works :

s3 = [<SNMPVariable value='~DQ (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]
s1 = [<SNMPVariable value='a{b (contains binary)' (oid='snmpEngineID', oid_index='0', snmp_type='OCTETSTR')>]

Before that, this test failed.

It seems the problem is really due to the same Engine Id situation.

FYI these 2 hosts run the same net-snmp 7.5 agent, without specify an Engine Id configuration (default conf I suppose). I don't know how 2 hosts may have same Engine Id. But it happens...

Guens avatar Jul 24 '22 08:07 Guens

Well I'm glad to see the mystery cleared up. However, I'm of the opinion that we should look into determining how and why EasySNMP/Net-SNMP behaves this way for this specific scenario and fix it within the library. It may take some time but I do believe that EasySNMP shouldn't care if two different devices have the same Engine ID. So for the time being I'm going to leave this issue open.

kamakazikamikaze avatar Jul 24 '22 12:07 kamakazikamikaze

I agree with you : EasySNMP shouldn't care if two different devices have the same Engine ID.

IMHO 2 "entities", with 2 different IP addresses, should be seen as 2 SNMP different devices, regardless of EngineId. But I don't know if it's RFC compliant.

Guens avatar Jul 25 '22 08:07 Guens

I agree with you : EasySNMP shouldn't care if two different devices have the same Engine ID.

IMHO 2 "entities", with 2 different IP addresses, should be seen as 2 SNMP different devices, regardless of EngineId. But I don't know if it's RFC compliant.

I agree. It would make sense to make the ID a pair of the target IP and Engine ID. However, given that some devices can have multiple IPs (routers and layer 3 switches for example), the Engine ID ensures that a device is not counted twice because it acts as the domain-specific unique identifier.

We may be at the mercy of the implementation of Net-SNMP since it handles the packet send/receive and caches the v3 information. Discovering the v3 user information being cached was a fluke since there was just one relevant result from Google that pointed me to StackOverflow. I would never have found this otherwise because the high-level Net-SNMP documentation didn't cover it. If this behavior is due to something deep within Net-SNMP and considered "the right way of handling it" by their team, you may be forced to ensure each device has a unique Engine ID to avoid the problem from recurring.

I'll try and take some time to investigate it while I also work with @mmattb to resolve long-standing issues with the codebase.

kamakazikamikaze avatar Jul 25 '22 13:07 kamakazikamikaze