ansible.netcommon icon indicating copy to clipboard operation
ansible.netcommon copied to clipboard

connect and command timeout ignored

Open maxrainer opened this issue 3 years ago • 2 comments

SUMMARY

Timeouts defined as vars directly under tasks are ignored. All timeouts must be defined in ansible.cfg.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

Tested with latest cisco.nxos collection. Tested with nxos_command and nxos_install_os. But might effect all collections and all modules using netcommon.

ANSIBLE VERSION
ansible 2.10.8
  config file = /Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg
  configured module search path = ['/Users/mrainer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/mrainer/venv/network/lib/python3.8/site-packages/ansible
  executable location = /Users/mrainer/venv/network/bin/ansible
  python version = 3.8.5 (default, Jul 21 2020, 10:48:26) [Clang 11.0.3 (clang-1103.0.32.62)]
CONFIGURATION
HOST_KEY_CHECKING(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = 600
PERSISTENT_CONNECT_TIMEOUT(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = 600
SHOW_CUSTOM_STATS(/Users/mrainer/Documents/dev/ansible/networking/roles/network-update/ansible.cfg) = True
OS / ENVIRONMENT

tested on Tower RHEL 8.3 and MacOS. Always same problem.

STEPS TO REPRODUCE

Command timeout and connect timeout are ignored when defined directly in tasks. (stay 30s) Only when defined in ansible.cfg under section [persistent_connection] it works.

  - name: check configuration compatibility to new image
    cisco.nxos.nxos_command:
      commands: "show incompatibility-all nxos bootflash:/{{ update_image_name_new }}"
      wait_for: result[0] contains No incompatibility configurations
      retries: 1
    vars:
      - ansible_command_timeout: 600
      - ansible_connect_timeout: 600
    register: _incompatibility
EXPECTED RESULTS

https://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html#command-timeout

ACTUAL RESULTS

maxrainer avatar Apr 26 '21 08:04 maxrainer

Hello,

I am facing the same issue with the plugin nxos_install_os, And I can't have the timeouts set globally.

Is there anything that can be done to avoid the bug and set the timeout as a variable specifically for the given task ?

EXPECTED RESUTLT

TASK [1 Install new OS] ****************************** changed: [myhost]

TASK [2 Wait For Device To Come Back Up] *************************************************************************** ok: [myhost]

TASK [3 Prompt install_output] ************************************************************************************* ok: [myhost] => { "msg": [ { "changed": true, "failed": false, "install_state": [ "Some truncated details on the installation" ] } ] }

ACTUAL RESULTS

TASK [1 Install new OS] ****************************** ok: [myhost]

TASK [2 Wait For Device To Come Back Up] *************************************************************************** [ERROR]: Traceback (most recent call last): File "$ENV/lib64/python3.6/site- packages/paramiko/channel.py", line 699, in recv out = self.in_buffer.read(nbytes, self.timeout) File "$ENV/lib64/python3.6/site-packages/paramiko/buffered_pipe.py", line 164, in read raise PipeTimeout() paramiko.buffered_pipe.PipeTimeout During handling of the above exception, another exception occurred: Traceback (most recent call last): File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 963, in send command, prompt, answer, newline, prompt_retry_check, check_all File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 919, in receive check_all, File "$ENV/lib64/python3.6/site- packages/ansible_collections/ansible/netcommon/plugins/connection/network_cli.py", line 727, in receive_paramiko data = self._ssh_shell.recv(256) File "$ENV/lib64/python3.6/site- packages/paramiko/channel.py", line 701, in recv raise socket.timeout() socket.timeout ok: [myhost]

TASK [3 Prompt install_output] ************************************************************************************* ok: [myhost] => { "msg": [ { "changed": false, "failed": false, "install_state": [] } ] }

note : the task 1 stops exactly after the time defined in ansible.cfg or in env variable. (not in ansible_command_timeout )

tin-ot avatar Nov 05 '21 14:11 tin-ot

I've found that a workaround is to use meta: reset_connection before and after the task you'd like to increase the timeout for.

My example task, which seemed to work

- name: Workaround to bump timeout
  meta: reset_connection

- name: Find any required upgrades for modules
  register: epld_upgrade_required
    vars:
      ansible_command_timeout: 90
    ansible.netcommon.cli_command:
      command: "show install all impact epld bootflash:{{ epld_file }} | json"

- name: Workaround, back to default timeout
  meta: reset_connection

Before using this workaround, the command would timeout at 30 seconds even though for this task I had the timeout set to 90 seconds. After this workaround, my command correctly waits longer and no longer fails.

The problem is network_cli.py will only read in the timeout variable on a new ssh connection. If it's not the first task in the playbook, and therefore you have an existing ssh connection already, the plugin will not update the command_timeout variable and continue to use the value used when the session was first established.

https://github.com/ansible-collections/ansible.netcommon/blob/1169d48faab1ec937d945b947c45ba40de9597f9/plugins/connection/network_cli.py#L583 https://github.com/ansible-collections/ansible.netcommon/blob/1169d48faab1ec937d945b947c45ba40de9597f9/plugins/connection/network_cli.py#L587

If you kill/reset the connection before your task, then the variable is read in and used when it establishes the new connection. Then you can kill/reset after so that the rest of your tasks still use the global timeout.

jknight-netscout avatar Apr 19 '22 18:04 jknight-netscout