cisco.nxos
cisco.nxos copied to clipboard
cisco.nxos.nxos_install_os fails with issu parameter set to 'yes'
SUMMARY
Module cisco.nxos.nxos_install_os fails with issu parameter set to 'yes' (while check_mode is set to 'no') with below messages: "raw_data": [ "timeout value 600 seconds reached while trying to send command: b'install all nxos nxos.9.3.9.bin non-disruptive'"], "msg": "Failed to upgrade device using command: ['terminal dont-ask', 'install all nxos nxos.9.3.9.bin non-disruptive']".
Module works perfectly with issu parameter set to 'yes' in check mode (while check_mode is set to 'yes'). It works perfectly as well with issu parameter set to 'no' (while check_mode is set to 'no') . With issu parameter set to 'yes' (while check_mode is set to 'no') it fails as a task, but in fact it sends with success command 'install all nxos nxos.9.3.9.bin non-disruptive' to Nexus switch. Task fails but switch starts being upgraded via non-disruptive method.
ISSUE TYPE
- Bug Report
COMPONENT NAME
module: nxos_install_os
ANSIBLE VERSION
ansible [core 2.13.1]
config file = /etc/ansible/ansible.cfg
python version = 3.8.13 (default, Apr 5 2022, 17:15:15) [GCC 9.1.1 20190605 (Red Hat 9.1.1-2)]
jinja version = 3.1.2
libyaml = True
COLLECTION VERSION
Collection Version
---------- -------
cisco.nxos 3.1.0
CONFIGURATION
OS / ENVIRONMENT
Nexus switch model: N9K-C93180YC-FX OS version: 9.3(7) Target OS version: 9.3(9)
STEPS TO REPRODUCE
- name: ISSU non-disruptive OS upgrade on N9k
check_mode: no
cisco.nxos.nxos_install_os:
system_image_file: nxos.9.3.9.bin
issu: yes
register: show_install_output
- name: Print show install output
debug:
var: show_install_output
EXPECTED RESULTS
Task should not fail, it should succeed with registered below return value (for key 'install_state'): "show_install_output.install_state": [ "Compatibility check is done:", "Module bootable Impact Install-type Reason", "------ -------- -------------- ------------ ------", " 1 yes non-disruptive reset ", "Images will be upgraded according to following table:", "Module Image Running-Version(pri:alt) New-Version Upg-Required", "------ ---------- ---------------------------------------- -------------------- ------------", " 1 nxos 9.3(7) 9.3(9) yes", " 1 bios v05.45(07/05/2021):v05.28(01/18/2018) v05.45(07/05/2021) no", "--------------------------------------", ]
ACTUAL RESULTS
Task fails: fatal: [N9K-C93180YC-FX]: FAILED! =>
{
"raw_data": [
"timeout value 600 seconds reached while trying to send command: b'install all nxos nxos.9.3.9.bin non-disruptive'"
],
"msg": "Failed to upgrade device using command: ['terminal dont-ask', 'install all nxos nxos.9.3.9.bin non-disruptive']",
"invocation": {
"module_args": {
"system_image_file": "nxos.9.3.9.bin",
"issu": "yes",
"kickstart_image_file": null,
"provider": null
}
},
"_ansible_no_log": false,
"changed": false
}
Hello there,
I think I am facing the same behaviour, both on N9K-C93180YC-EX and FX. I have tried the following upgrade so far (both on EX & FX) :
- from 7.0(3)I7(8) to 9.3(7),
- from 7.0(3)I7(8) to 9.3(9),
- from 9.3(7) to 9.3(9).
The actual upgrade is done on the switch, but for some reason the Ansible task timeout, so the rest of the playbook does not play. I have access to a console connection to the switch so I can see the upgrade happening, the switch reloading and getting ready again in less than 600 seconds but the Ansible playbook does not see that and just wait for the 600 seconds timer to expire before failing.
What bother me the most is that everything is working just fine on N3K-C3048TP-1GE and N3K-C3548P-10G. In this case, from the console I can see the upgrade happening and the Ansible playbook resume right after the switch is SSH reachable again.
I can provide logs and try fixes if needed, I also happen to have access to a lab with other N9K & N3K references if this can help troubleshoot this.
Here is a very minimalistic environment where the issue happens : Note : check at the very end of this post for the successful output of the same playbook running against a N3K-C3048TP-1GE
Ansible collection list (truncated) :
# /home/ansible/.ansible/collections/ansible_collections
Collection Version
----------------- -------
ansible.netcommon 3.1.0
ansible.utils 2.6.1
cisco.nxos 3.1.0
Inventory file (filename : hosts) :
---
nxos:
vars:
ansible_connection: ansible.netcommon.network_cli
ansible_network_os: cisco.nxos.nxos
hosts:
N93180EX_1:
ansible_host: 10.1.1.1
...
Playbook (filename : upgrade.yaml) :
- name: Simple upgrade
gather_facts: false
hosts: nxos
tasks:
- name: Upgrade NXOS
cisco.nxos.nxos_install_os:
system_image_file: nxos.9.3.9.bin
issu: desired
register: debugvar
- name: Debug upgrade
debug:
var: debugvar
Command :
ansible-playbook upgrade.yaml -i hosts -u admin -k -vvvvv
ansible.cfg :
[persistent_connection]
connect_timeout = 600
command_timeout = 600
Output :
ansible-playbook [core 2.13.4]
config file = /home/ansible/testDir/ansible.cfg
configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/ansible/.local/lib/python3.9/site-packages/ansible
ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
executable location = /home/ansible/.local/bin/ansible-playbook
python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]
jinja version = 3.1.2
libyaml = True
Using /home/ansible/testDir/ansible.cfg as config file
SSH password:
setting up inventory plugins
host_list declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
script declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
auto declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
Parsed /home/ansible/testDir/hosts inventory source with yaml plugin
Loading collection cisco.nxos from /home/ansible/.ansible/collections/ansible_collections/cisco/nxos
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
Loading callback plugin default of type stdout, v2.0 from /home/ansible/.local/lib/python3.9/site-packages/ansible/plugins/callback/default.py
Attempting to use 'default' callback.
Skipping callback 'default', as we already have a stdout callback.
Attempting to use 'junit' callback.
Attempting to use 'minimal' callback.
Skipping callback 'minimal', as we already have a stdout callback.
Attempting to use 'oneline' callback.
Skipping callback 'oneline', as we already have a stdout callback.
Attempting to use 'tree' callback.
PLAYBOOK: upgrade.yaml *********************************************************************************
Positional arguments: upgrade.yaml
verbosity: 5
remote_user: admin
connection: smart
timeout: 10
ask_pass: True
become_method: sudo
tags: ('all',)
inventory: ('/home/ansible/testDir/hosts',)
forks: 5
1 plays in upgrade.yaml
PLAY [Simple upgrade] **********************************************************************************
META: ran handlers
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
Loading collection ansible.netcommon from /home/ansible/.ansible/collections/ansible_collections/ansible/netcommon
TASK [Upgrade NXOS] ************************************************************************************
task path: /home/ansible/testDir/upgrade.yaml:5
<10.1.1.1> attempting to start connection
<10.1.1.1> using connection plugin ansible.netcommon.network_cli
Found ansible-connection at path /home/ansible/.local/bin/ansible-connection
<10.1.1.1> local domain socket does not exist, starting it
<10.1.1.1> control socket path is /home/ansible/.ansible/pc/dd6bec9c00
<10.1.1.1> Loading collection ansible.netcommon from /home/ansible/.ansible/collections/ansible_collections/ansible/netcommon
<10.1.1.1> Loading collection cisco.nxos from /home/ansible/.ansible/collections/ansible_collections/cisco/nxos
<10.1.1.1> local domain socket listeners started successfully
<10.1.1.1> loaded cliconf plugin ansible_collections.cisco.nxos.plugins.cliconf.nxos from path /home/ansible/.ansible/collections/ansible_collections/cisco/nxos/plugins/cliconf/nxos.py for network_os cisco.nxos.nxos
<10.1.1.1> ssh type is set to auto
<10.1.1.1> autodetecting ssh_type
[WARNING]: ansible-pylibssh not installed, falling back to paramiko
<10.1.1.1> ssh type is now set to paramiko
<10.1.1.1>
<10.1.1.1> local domain socket path is /home/ansible/.ansible/pc/dd6bec9c00
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
<10.1.1.1> PERSISTENT_COMMAND_TIMEOUT is 600
<10.1.1.1> PERSISTENT_CONNECT_TIMEOUT is 600
<10.1.1.1> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
<10.1.1.1> ANSIBLE_NETWORK_IMPORT_MODULES: found cisco.nxos.nxos_install_os at /home/ansible/.ansible/collections/ansible_collections/cisco/nxos/plugins/modules/nxos_install_os.py
<10.1.1.1> ANSIBLE_NETWORK_IMPORT_MODULES: running cisco.nxos.nxos_install_os
<10.1.1.1> ANSIBLE_NETWORK_IMPORT_MODULES: complete
<10.1.1.1> ANSIBLE_NETWORK_IMPORT_MODULES: Result: {'raw_data': ["timeout value 600 seconds reached while trying to send command: b'install all nxos nxos.9.3.9.bin non-disruptive'"], 'failed': True, 'msg': "Failed to upgrade device using command: ['terminal dont-ask', 'install all nxos nxos.9.3.9.bin non-disruptive']", 'invocation': {'module_args': {'system_image_file': 'nxos.9.3.9.bin', 'issu': 'desired', 'kickstart_image_file': None, 'provider': None}}, '_ansible_parsed': True}
fatal: [N93180EX_1]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"issu": "desired",
"kickstart_image_file": null,
"provider": null,
"system_image_file": "nxos.9.3.9.bin"
}
},
"msg": "Failed to upgrade device using command: ['terminal dont-ask', 'install all nxos nxos.9.3.9.bin non-disruptive']",
"raw_data": [
"timeout value 600 seconds reached while trying to send command: b'install all nxos nxos.9.3.9.bin non-disruptive'"
]
}
PLAY RECAP *********************************************************************************************
N93180EX_1 : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Successful output of the same playbook when upgrading a N3K-C3048TP-1GE from 7.0(3)I7(8) to 9.3(9) (using compacted images if that matter) :
ansible-playbook [core 2.13.4]
config file = /home/ansible/testDir/ansible.cfg
configured module search path = ['/home/ansible/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/ansible/.local/lib/python3.9/site-packages/ansible
ansible collection location = /home/ansible/.ansible/collections:/usr/share/ansible/collections
executable location = /home/ansible/.local/bin/ansible-playbook
python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110]
jinja version = 3.1.2
libyaml = True
Using /home/ansible/testDir/ansible.cfg as config file
SSH password:
setting up inventory plugins
host_list declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
script declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
auto declined parsing /home/ansible/testDir/hosts as it did not pass its verify_file() method
Parsed /home/ansible/testDir/hosts inventory source with yaml plugin
Loading collection cisco.nxos from /home/ansible/.ansible/collections/ansible_collections/cisco/nxos
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
Loading callback plugin default of type stdout, v2.0 from /home/ansible/.local/lib/python3.9/site-packages/ansible/plugins/callback/default.py
Attempting to use 'default' callback.
Skipping callback 'default', as we already have a stdout callback.
Attempting to use 'junit' callback.
Attempting to use 'minimal' callback.
Skipping callback 'minimal', as we already have a stdout callback.
Attempting to use 'oneline' callback.
Skipping callback 'oneline', as we already have a stdout callback.
Attempting to use 'tree' callback.
PLAYBOOK: upgrade.yaml **********************************************************************************************************************************************************
Positional arguments: upgrade.yaml
verbosity: 5
remote_user: admin
connection: smart
timeout: 10
ask_pass: True
become_method: sudo
tags: ('all',)
inventory: ('/home/ansible/testDir/hosts',)
forks: 5
1 plays in upgrade.yaml
PLAY [Simple upgrade] ***********************************************************************************************************************************************************
META: ran handlers
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
Loading collection ansible.netcommon from /home/ansible/.ansible/collections/ansible_collections/ansible/netcommon
TASK [Upgrade NXOS] *************************************************************************************************************************************************************
task path: /home/ansible/testDir/upgrade.yaml:5
<10.1.1.2> attempting to start connection
<10.1.1.2> using connection plugin ansible.netcommon.network_cli
Found ansible-connection at path /home/ansible/.local/bin/ansible-connection
<10.1.1.2> local domain socket does not exist, starting it
<10.1.1.2> control socket path is /home/ansible/.ansible/pc/c3161504c3
<10.1.1.2> Loading collection ansible.netcommon from /home/ansible/.ansible/collections/ansible_collections/ansible/netcommon
<10.1.1.2> Loading collection cisco.nxos from /home/ansible/.ansible/collections/ansible_collections/cisco/nxos
<10.1.1.2> local domain socket listeners started successfully
<10.1.1.2> loaded cliconf plugin ansible_collections.cisco.nxos.plugins.cliconf.nxos from path /home/ansible/.ansible/collections/ansible_collections/cisco/nxos/plugins/cliconf/nxos.py for network_os cisco.nxos.nxos
<10.1.1.2> ssh type is set to auto
<10.1.1.2> autodetecting ssh_type
[WARNING]: ansible-pylibssh not installed, falling back to paramiko
<10.1.1.2> ssh type is now set to paramiko
<10.1.1.2>
<10.1.1.2> local domain socket path is /home/ansible/.ansible/pc/c3161504c3
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
redirecting (type: action) cisco.nxos.nxos_install_os to cisco.nxos.nxos
<10.1.1.2> PERSISTENT_COMMAND_TIMEOUT is 600
<10.1.1.2> PERSISTENT_CONNECT_TIMEOUT is 600
<10.1.1.2> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
<10.1.1.2> ANSIBLE_NETWORK_IMPORT_MODULES: found cisco.nxos.nxos_install_os at /home/ansible/.ansible/collections/ansible_collections/cisco/nxos/plugins/modules/nxos_install_os.py
<10.1.1.2> ANSIBLE_NETWORK_IMPORT_MODULES: running cisco.nxos.nxos_install_os
<10.1.1.2> ANSIBLE_NETWORK_IMPORT_MODULES: complete
<10.1.1.2> ANSIBLE_NETWORK_IMPORT_MODULES: Result: {'changed': True, 'install_state': ['Compatibility check is done:', 'Module bootable Impact Install-type Reason', '------ -------- -------------- ------------ ------', ' 1 yes disruptive reset default upgrade is not hitless', 'Images will be upgraded according to following table:', 'Module Image Running-Version(pri:alt) New-Version Upg-Required', '------ ---------- ---------------------------------------- -------------------- ------------', ' 1 nxos 7.0(3)I7(8) 9.3(9) yes', ' 1 bios v5.0.0(06/06/2018) v5.0.0(06/06/2018) no', ' 1 power-seq 5.5 5.5 no', 'Module 1: Refreshing compact flash and upgrading bios/loader/bootrom.'], 'invocation': {'module_args': {'system_image_file': 'n3000-compact.9.3.9.bin', 'issu': 'desired', 'kickstart_image_file': None, 'provider': None}}, '_ansible_parsed': True}
changed: [N3048_1] => {
"changed": true,
"install_state": [
"Compatibility check is done:",
"Module bootable Impact Install-type Reason",
"------ -------- -------------- ------------ ------",
" 1 yes disruptive reset default upgrade is not hitless",
"Images will be upgraded according to following table:",
"Module Image Running-Version(pri:alt) New-Version Upg-Required",
"------ ---------- ---------------------------------------- -------------------- ------------",
" 1 nxos 7.0(3)I7(8) 9.3(9) yes",
" 1 bios v5.0.0(06/06/2018) v5.0.0(06/06/2018) no",
" 1 power-seq 5.5 5.5 no",
"Module 1: Refreshing compact flash and upgrading bios/loader/bootrom."
],
"invocation": {
"module_args": {
"issu": "desired",
"kickstart_image_file": null,
"provider": null,
"system_image_file": "n3000-compact.9.3.9.bin"
}
}
}
TASK [Debug upgrade] ************************************************************************************************************************************************************
task path: /home/ansible/testDir/upgrade.yaml:11
<10.1.1.2> attempting to start connection
<10.1.1.2> using connection plugin ansible.netcommon.network_cli
Found ansible-connection at path /home/ansible/.local/bin/ansible-connection
<10.1.1.2> found existing local domain socket, using it!
<10.1.1.2> invoked shell using ssh_type: paramiko
<10.1.1.2> ssh connection done, setting terminal
<10.1.1.2> loaded terminal plugin for network_os cisco.nxos.nxos
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> firing event: on_open_shell()
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> ssh connection has completed successfully
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> Response received, triggered 'persistent_buffer_read_timeout' timer of 0.1 seconds
<10.1.1.2> updating play_context for connection
<10.1.1.2>
<10.1.1.2> local domain socket path is /home/ansible/.ansible/pc/c3161504c3
ok: [N3048_1] => {
"debugvar": {
"changed": true,
"failed": false,
"install_state": [
"Compatibility check is done:",
"Module bootable Impact Install-type Reason",
"------ -------- -------------- ------------ ------",
" 1 yes disruptive reset default upgrade is not hitless",
"Images will be upgraded according to following table:",
"Module Image Running-Version(pri:alt) New-Version Upg-Required",
"------ ---------- ---------------------------------------- -------------------- ------------",
" 1 nxos 7.0(3)I7(8) 9.3(9) yes",
" 1 bios v5.0.0(06/06/2018) v5.0.0(06/06/2018) no",
" 1 power-seq 5.5 5.5 no",
"Module 1: Refreshing compact flash and upgrading bios/loader/bootrom."
]
}
}
META: ran handlers
META: ran handlers
PLAY RECAP **********************************************************************************************************************************************************************
N3048_1 : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Edit : This is also working fine on a N3K-C3548P-10G
cc @mikewiebe @praveenramoorthy - Could you please help us out with this? Thank you.
@boleslawlucjanek @YooBZH - I was able to repro this issue on N9K-C93180YC-EX. Checking on this. Thanks.
We continue to debug this further. So far, we've understood that whenever the switchover happens as part of the upgrade, the connectivity is momentarily lost and that stalls the Ansible playbook run.
Any news on this one ? I managed to workaround this issue with a lot of "block & rescue", timer and checks but at the cost that my playbooks now can take much more time to run if the issue is hit.