py-junos-eznc icon indicating copy to clipboard operation
py-junos-eznc copied to clipboard

Fact gathering returns incorrect information on a dual RE host when RE0 is in Present state

Open stacywsmith opened this issue 7 years ago • 2 comments

On a device which has dual Routing Engines, it may be possible for one of the Routing Engines to be in a Present state. This state indicates that the Routing Engine is not reachable from the Master Routing Engine.

user@r0> show chassis routing-engine 
Routing Engine status:
  Slot 0:
    Current state                  Present
  Slot 1:
    Current state                  Master
    Election priority              Backup (default)
    DRAM                      3072 MB (4096 MB installed)
    Memory utilization          31 percent
    5 sec CPU utilization:
      User                       1 percent
      Background                 0 percent
      Kernel                     0 percent
      Interrupt                  1 percent
      Idle                      99 percent
    1 min CPU utilization:
      User                       0 percent
      Background                 0 percent
      Kernel                     0 percent
      Interrupt                  2 percent
      Idle                      97 percent
    5 min CPU utilization:
      User                       1 percent
      Background                 0 percent
      Kernel                     0 percent
      Interrupt                  2 percent
      Idle                      97 percent
    15 min CPU utilization:
      User                       1 percent
      Background                 0 percent
      Kernel                     1 percent
      Interrupt                  2 percent
      Idle                      97 percent
    Model                          RE-VMX
    Start time                     2017-05-09 11:46:04 IST
    Uptime                         21 days, 12 hours, 14 minutes, 24 seconds
    Last reboot reason             0x200:normal shutdown 
    Load averages:                 1 minute   5 minute  15 minute
                                       0.02       0.03       0.00

This situation causes the show version invoke-on all-routing-engines | display xml command to produce an error:

user@r0> show version invoke-on all-routing-engines | display xml    
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/16.2D0/junos">
    <xnm:error xmlns="http://xml.juniper.net/xnm/1.1/xnm" xmlns:xnm="http://xml.juniper.net/xnm/1.1/xnm">
        <message>
            Could not connect to re0 : No route to host
        </message>
    </xnm:error>
    <multi-routing-engine-results>
        
        <multi-routing-engine-item>
            
            <re-name>re1</re-name>
            
            <software-information>
                <host-name>re0</host-name>
                <product-model>mx960</product-model>
                <product-name>mx960</product-name>
                <junos-version>16.2-20161025_dev_common.0</junos-version>
                <junos-edition>limited</junos-edition>
...

This error causes the fact gathering code to incorrectly think this is a single RE device.

stacywsmith avatar May 30 '17 18:05 stacywsmith

Hi,

I have several MX480 routers with similar setups (where the second RE (RE1) is present) where pyez (version 2.2.0) based code completely misbehaves (pyez scripts and Juniper's Ansible modules):

Minimal code pyez code:

with Device(host = '10.10.10.10', port = 22, user = ..., password = ..., timeout = 20) as dev:
    print(dev.facts['2RE'])
    print(dev.facts['RE0'])
    print(dev.facts['RE1'])
    # Uncomment one or both of print lines below to get an Exception
    # "ncclient.operations.errors.TimeoutExpiredError: ncclient timed out while waiting for an rpc reply"
    # print(dev.facts['version_RE0'])
    # print(dev.facts['version_RE1'])

Notes:

  • the above code runs fine but if you uncomment one or both of the print statements (version_RE[01]) the script misbehaves: it takes it more than two and a half minutes to print the last facts (version_RE[01]) and dies with an ncclient.operations.errors.TimeoutExpiredError exception.
  • partial output of "show version invoke-on all-routing-engines | display xml" also attached
  • the core Ansible module (junos_facts) doesn't have a problem gathering the facts from a dual-RE router with this setup.

Script output:

$ date; blue-python3 pyez_facts_dual_re_02.py ; date Mon Nov 12 15:57:51 UTC 2018 True {'mastership_state': 'master', 'status': 'OK', 'model': 'RE-S-1800x4', ...} {'mastership_state': 'Present', 'status': None, 'model': None, ...} 16.1R5.7 None Traceback (most recent call last): File "pyez_facts_dual_re_02.py", line 26, in main() File "pyez_facts_dual_re_02.py", line 22, in main print(dev.facts['version_RE1']) File "/opt/blue-python/3.6/lib/python3.6/site-packages/jnpr/junos/device.py", line 1349, in exit self.close() File "/opt/blue-python/3.6/lib/python3.6/site-packages/jnpr/junos/device.py", line 1328, in close self._conn.close_session() File "/opt/blue-python/3.6/lib/python3.6/site-packages/ncclient/manager.py", line 170, in wrapper return self.execute(op_cls, *args, **kwds) File "/opt/blue-python/3.6/lib/python3.6/site-packages/ncclient/manager.py", line 240, in execute raise_mode=self._raise_mode).request(*args, **kwds) File "/opt/blue-python/3.6/lib/python3.6/site-packages/ncclient/operations/session.py", line 28, in request return self._request(new_ele("close-session")) File "/opt/blue-python/3.6/lib/python3.6/site-packages/ncclient/operations/rpc.py", line 342, in _request raise TimeoutExpiredError('ncclient timed out while waiting for an rpc reply.') ncclient.operations.errors.TimeoutExpiredError: ncclient timed out while waiting for an rpc reply. Mon Nov 12 16:00:24 UTC 2018

Output of "show version invoke-on all-routing-engines | display xml"

    <multi-routing-engine-item>

        <re-name>re0</re-name>

        <software-information>
            <host-name>ROUTER_NAME</host-name>
            <product-model>mx480</product-model>
            <product-name>mx480</product-name>
            <junos-version>16.1R5.7</junos-version>
            ...
        </software-information>
    </multi-routing-engine-item>

    <xnm:error xmlns="http://xml.juniper.net/xnm/1.1/xnm" xmlns:xnm="http://xml.juniper.net/xnm/1.1/xnm">
        <message>
            Could not connect to re1 : No route to host
        </message>
    </xnm:error>
</multi-routing-engine-results>
<cli>
    <banner></banner>
</cli>
----------

Any change of having this fixed?

jpoliv avatar Nov 12 '18 16:11 jpoliv

Hi ,

with latest PyEZ release (junos-eznc 2.6.5 ), script looks to be works fine , Could you please check and update .

from jnpr.junos import Device from pprint import pprint

dev = Device(host='xx.xx.xx.xx', user='xyz', password='xyz') dev.open() print(dev.facts['2RE']) print(dev.facts['RE0']) print(dev.facts['RE1']) print(dev.facts['virtual'])

python test.py True {'mastership_state': 'master', 'status': 'OK', 'model': 'RE-VMX', 'last_reboot_reason': 'Router rebooted after a normal shutdown.', 'up_time': '1 hour, 42 minutes, 52 seconds'} {'mastership_state': 'Present', 'status': None, 'model': None, 'last_reboot_reason': None, 'up_time': None} True

Thanks Chidanand

chidanandpujar avatar Jul 29 '22 12:07 chidanandpujar