ansible-junos-stdlib icon indicating copy to clipboard operation
ansible-junos-stdlib copied to clipboard

juniper_junos_software: Unable to Validate that NSSU Works

Open lucasalvatore opened this issue 5 years ago • 2 comments

  • Bug Report
  • Feature Idea

Module Name

juniper_junos_software

Juniper.Junos role and Python libraries version

$ ansible --version
ansible 2.7.10
  config file = /opt/ansible/ansible.cfg
  configured module search path = [u'/home/luca/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /opt/ansible/ansible-venv/local/lib/python2.7/site-packages/ansible
  executable location = /opt/ansible/ansible-venv/bin/ansible
  python version = 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]

ansible==2.7.10
ansible-netbox-inventory==1.0.9
asn1crypto==0.24.0
bcrypt==3.1.6
certifi==2019.3.9
cffi==1.12.2
chardet==3.0.4
cryptography==2.6.1
enum34==1.1.6
idna==2.8
ipaddress==1.0.22
Jinja2==2.10.1
junos-eznc==2.2.0
jxmlease==1.0.1
lxml==4.3.3
MarkupSafe==1.1.1
ncclient==0.6.4
netaddr==0.7.19
paramiko==2.4.2
pkg-resources==0.0.0
pyasn1==0.4.5
pycparser==2.19
PyNaCl==1.3.0
pyserial==3.4
PyYAML==5.1
requests==2.21.0
scp==0.13.2
selectors2==2.0.1
six==1.12.0
urllib3==1.24.1

OS / Environment

QFX5100-48S Virtual Chassis 17.4R1 (2 members)

Summary

when running NSSU the backup RE is upgraded first, then the RE is flipped to the newly upgraded device so the other member can be upgraded. When the RE flip happens the netconf session is broken (same with SSH if you happen to be on the command line)

This means the ansible never gets the message that a reboot has been initiated, such as: Package /opt/ansible/software/jinstall-host-qfx-5-17.4R1.16-signed.tgz successfully installed. Reboot successfully initiated."

Therefore Ansible will eventually error out when when the RPC timer expires.

Steps to reproduce

Basically just run any upgrade using nssu using juniper_junos_software

- name: Install Junos OS package QFX5K
  juniper_junos_software:
    #version: "17.4R2-S2.3"
    cleanfs: no
    local_package: "/opt/ansible/software/jinstall-host-qfx-5-17.4R1.16-signed.tgz"
    remote_package: "/var/tmp/jinstall-host-qfx-5-17.4R1.16-signed.tgz"
    nssu: yes
    checksum:
    reboot: true
    validate: false
    force_host: yes
    logfile: /opt/ansible/logs/{{ inventory_hostname }}-logs.log
    user: "{{ username }}"
    passwd: "{{ password }}"
  register: sw

- name: Check Status
  debug:
    var: sw

Expected results

Not really sure if there is a workaround here... be great if we could reconnect after the RE flip to confirm a reboot has been initiated

Actual results

The error is: TimeoutExpiredError('ncclient timed out while waiting for an rpc reply.')\nncclient.operations.errors.TimeoutExpiredError: ncclient timed out while waiting for an rpc reply. Once the netconf session dies, it never sees that the last node reboots so never moves on


lucasalvatore avatar May 14 '19 01:05 lucasalvatore

@lucasalvatore1 I have not worked on NSSU. Let me take a look at how NSSU is handled on the device and using PyEZ, then we can discuss further what can be added/modified to make the module better.

rsmekala avatar May 15 '19 04:05 rsmekala

thank you very much @rsmekala

lucasalvatore avatar May 15 '19 15:05 lucasalvatore