ansible.netcommon
ansible.netcommon copied to clipboard
connect and command timeout ignored
SUMMARY
Timeouts defined as vars directly under tasks are ignored. All timeouts must be defined in ansible.cfg.
ISSUE TYPE
- Bug Report
COMPONENT NAME
Tested with latest cisco.nxos collection. Tested with nxos_command and nxos_install_os. But might effect all collections and all modules using netcommon.
ANSIBLE VERSION
ansible 2.10.8
config file = /Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg
configured module search path = ['/Users/mrainer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /Users/mrainer/venv/network/lib/python3.8/site-packages/ansible
executable location = /Users/mrainer/venv/network/bin/ansible
python version = 3.8.5 (default, Jul 21 2020, 10:48:26) [Clang 11.0.3 (clang-1103.0.32.62)]
CONFIGURATION
HOST_KEY_CHECKING(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = 600
PERSISTENT_CONNECT_TIMEOUT(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = 600
SHOW_CUSTOM_STATS(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = True
OS / ENVIRONMENT
tested on Tower RHEL 8.3 and MacOS. Always same problem.
STEPS TO REPRODUCE
Command timeout and connect timeout are ignored when defined directly in tasks. (stay 30s) Only when defined in ansible.cfg under section [persistent_connection] it works.
- name: check configuration compatibility to new image
cisco.nxos.nxos_command:
commands: "show incompatibility-all nxos bootflash:/{{ update_image_name_new }}"
wait_for: result[0] contains No incompatibility configurations
retries: 1
vars:
- ansible_command_timeout: 600
- ansible_connect_timeout: 600
register: _incompatibility
EXPECTED RESULTS
https://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html#command-timeout
ACTUAL RESULTS
Hello,
I am facing the same issue with the plugin nxos_install_os, And I can't have the timeouts set globally.
Is there anything that can be done to avoid the bug and set the timeout as a variable specifically for the given task ?
EXPECTED RESUTLT
TASK [1 Install new OS] ****************************** changed: [myhost]
TASK [2 Wait For Device To Come Back Up] *************************************************************************** ok: [myhost]
TASK [3 Prompt install_output] ************************************************************************************* ok: [myhost] => { "msg": [ { "changed": true, "failed": false, "install_state": [ "Some truncated details on the installation" ] } ] }
ACTUAL RESULTS
TASK [1 Install new OS] ****************************** ok: [myhost]
TASK [2 Wait For Device To Come Back Up] *************************************************************************** [ERROR]: Traceback (most recent call last): File "$ENV/lib64/python3.6/site- packages/paramiko/channel.py", line 699, in recv out = self.in_buffer.read(nbytes, self.timeout) File "$ENV/lib64/python3.6/site-packages/paramiko/buffered_pipe.py", line 164, in read raise PipeTimeout() paramiko.buffered_pipe.PipeTimeout During handling of the above exception, another exception occurred: Traceback (most recent call last): File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 963, in send command, prompt, answer, newline, prompt_retry_check, check_all File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 919, in receive check_all, File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 727, in receive_paramiko data = self._ssh_shell.recv(256) File "$ENV/lib64/python3.6/site- packages/paramiko/channel.py", line 701, in recv raise socket.timeout() socket.timeout ok: [myhost]
TASK [3 Prompt install_output] ************************************************************************************* ok: [myhost] => { "msg": [ { "changed": false, "failed": false, "install_state": [] } ] }
note : the task 1 stops exactly after the time defined in ansible.cfg or in env variable. (not in ansible_command_timeout )
I've found that a workaround is to use meta: reset_connection
before and after the task you'd like to increase the timeout for.
My example task, which seemed to work
- name: Workaround to bump timeout
meta: reset_connection
- name: Find any required upgrades for modules
register: epld_upgrade_required
vars:
ansible_command_timeout: 90
ansible.netcommon.cli_command:
command: "show install all impact epld bootflash:{{ epld_file }} | json"
- name: Workaround, back to default timeout
meta: reset_connection
Before using this workaround, the command would timeout at 30 seconds even though for this task I had the timeout set to 90 seconds. After this workaround, my command correctly waits longer and no longer fails.
The problem is network_cli.py will only read in the timeout variable on a new ssh connection. If it's not the first task in the playbook, and therefore you have an existing ssh connection already, the plugin will not update the command_timeout variable and continue to use the value used when the session was first established.
https://github.com/ansible-collections/ansible.netcommon/blob/1169d48faab1ec937d945b947c45ba40de9597f9/plugins/connection/network_cli.py#L583 https://github.com/ansible-collections/ansible.netcommon/blob/1169d48faab1ec937d945b947c45ba40de9597f9/plugins/connection/network_cli.py#L587
If you kill/reset the connection before your task, then the variable is read in and used when it establishes the new connection. Then you can kill/reset after so that the rest of your tasks still use the global timeout.