ansible.netcommon icon indicating copy to clipboard operation
ansible.netcommon copied to clipboard

network_cli doesn't respect user-specified timeout values for all tasks

Open rudimocnik opened this issue 4 years ago • 6 comments

SUMMARY

network_cli doesn't respect ansible_command_timeout on one of my tasks.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

network_cli

ANSIBLE VERSION
ansible 2.10.6
config file = /home/rudimocnik/ansible/dvp/ansible.cfg
configured module search path = ['/home/rudimocnik/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/rudimocnik/virtualenv/py3-ansible/lib/python3.8/site-packages/ansible
executable location = /home/rudimocnik/virtualenv/py3-ansible/bin/ansible
python version = 3.8.6 (default, Sep 25 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)]
CONFIGURATION
ACTION_WARNINGS(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
DEFAULT_FORKS(/home/rudimocnik/ansible/dvp/ansible.cfg) = 40
DEFAULT_HOST_LIST(/home/rudimocnik/ansible/dvp/ansible.cfg) = ['/home/rudimocnik/ansible/dvp/inv.yml']
DEPRECATION_WARNINGS(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
HOST_KEY_CHECKING(/home/rudimocnik/ansible/dvp/ansible.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/home/rudimocnik/ansible/dvp/ansible.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/home/rudimocnik/ansible/dvp/ansible.cfg) = 90
OS / ENVIRONMENT

Cisco cat9300 running 16.12.3a

STEPS TO REPRODUCE

run the install_ios.yml on a 9300 switch

---
# This playbook Upgrades Cisco devices

# Find 9300 stack

- name: UPGRADE c9300 & ASR-920
  hosts: all
  #strategy: free
  connection: network_cli
  gather_facts: false

  tasks:
    - name: Scan the network
      ios_facts:
        gather_subset: all

#### Upgrade IOS on c9300 stack

    - name: UPGRADE C9300 stack  // This task will be skipped if image is compliant and for non c9300 devices.
      include_role:
        name: install_upgrade
      when: (ansible_net_model == "C9300-24P" or ansible_net_model == "C9300-24T") and (ansible_net_version != c9300_upgrade_ios_version)

##### my install_upgrade role #####

########### main.yml ##########
---
# tasks file for ./roles/ios_image_upgrade

- include_tasks: version-check.yml

- include_tasks: file_transfer.yml

- include_tasks: install_ios.yml

- include_tasks: save_config.yml

- include_tasks: reload.yml

- include_tasks: version-check.yml
  
- include_tasks: cleanup.yml


########### file_transfer.yml ##########
---
# Transfer file to Cisco device

- name: Copy image to target device 
  cisco.ios.ios_command:
    commands:
    - command: "copy {{ c9300_file_source }}{{ c9300_file_name }} flash:"
      prompt: "Destination filename [{{ c9300_file_name }}]?"
      answer: "\r"
  vars:
    ansible_command_timeout: 300
	

########### install_ios.yml ##########
---
# Install image file to Cisco device

- name: Install new image
  cisco.ios.ios_command:
    commands: "install add file flash:{{ c9300_file_name }} activate commit prompt-level none"
  vars:
    ansible_command_timeout: 900
EXPECTED RESULTS

I expect for timeout of 900 to be respected in the install_ios.yml task inside my role similar to the file_transfer.yml.

ACTUAL RESULTS

File transfers successfully while install_ios task fails with 30 second timeout error. output-playbook

Also I am not sure where is the 'b' coming from in the error message b'install add file ....'

timeout value 30 seconds reached while trying to send command

rudimocnik avatar Mar 09 '21 19:03 rudimocnik

@rudimocnik could you please share detailed ansible logs for the same Play. you could enable that by executing below on the terminal:

export ANSIBLE_LOG_PATH=ansible_logs.log
export ANSIBLE_PERSISTENT_LOG_MESSAGES=TRUE
export ANSIBLE_DEBUG=TRUE

rohitthakur2590 avatar Mar 26 '21 11:03 rohitthakur2590

@rohitthakur2590 Excuse my late reply. Here is the output you requested.

ansible_logs.log

rudimocnik avatar Mar 30 '21 10:03 rudimocnik

@rohitthakur2590 I am having the same issue as @rudimocnik

ansible-playbook 2.10.7
config file = /etc/ansible/ansible.cfg
configured module search path = ['/home/deployer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3.6/site-packages/ansible
executable location = /usr/bin/ansible-playbook
python version = 3.6.7 (default, Dec  6 2018, 11:09:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)]

Ansible collection list

ansible-galaxy collection list
 24539 1617127198.50085: starting run
/usr/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
 24539 1617127202.65879: Validate TLS certificates for https://galaxy.ansible.com: True

# /home/deployer/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 2.0.0
ansible.posix     1.2.0
ansible.utils     2.0.1
cisco.ios         2.0.0

When running the command directly on device:

begnt-vdb4-lab-023#install add file flash:cat9k_lite_iosxe.16.12.05.SPA.bin
install_add: START Tue Mar 30 17:34:51 UTC 2021
Mar 30 17:34:54.588 %INSTALL-5-INSTALL_START_INFO: R0/0: install_engine: Started install add flash:cat9k_lite_iosxe.16.12.05.SPA.bin
install_add: Adding PACKAGE
install_add: Checking whether new add is allowed ....

--- Starting initial file syncing ---
Info: Finished copying flash:cat9k_lite_iosxe.16.12.05.SPA.bin to the selected switch(es)
Finished initial file syncing

--- Starting Add ---
Performing Add on all members
  [1] Add package(s) on switch 1
  [1] Finished Add on switch 1
Checking status of Add on [1]
Add: Passed on [1]
Finished Add

Image added. Version: 16.12.5.0.5625
SUCCESS: install_add  Tue Mar 30 17:43:35 UTC 2021
Mar 30 17:43:37.045 %INSTALL-5-INSTALL_COMPLETED_INFO: R0/0: install_engine: Completed install add PACKAGE flash:cat9k_lite_iosxe.16.12.05.SPA.bin

Ansible task:

- name: ios-xe_installing | MODE INSTALL | install | Add image
  vars:
    ansible_command_timeout: 1800
  when: not image_installed
  register: upgrade_results
  cisco.ios.ios_command:
    commands:
      - command: "install add file {{ ansible_net_filesystems if (ansible_net_filesystems|length > 1) else ansible_net_filesystems[0] }}{{ required_ios_binary }}"

but when running the Ansible role then I notice that the ansible_command_timeout doesn't have any impact, see logs (partial) ansible_logs.log

lorephoenix avatar Mar 30 '21 18:03 lorephoenix

@rohitthakur2590 I have been testing different scenarios and I couldn't make the "Install new image" task work inside the role. However, I was able to use include_playbook and in this seperate playbook included just the install task. Strangely this worked but I have no explanation why this time te timeout did not trigger. Furthermore, if I add more tasks to "Install new image2" play the problems with timeout reappear.

This is Install new image2 playbook

  • name: Install new image2 hosts: all connection: network_cli gather_facts: false

tasks: - name: Install new image2 cisco.ios.ios_command: commands: "install add file flash:cat9k_iosxe.17.03.03.SPA.bin activate commit prompt-level none" when: (ansible_net_model == "C9300-24P" or ansible_net_model == "C9300-24T") and ansible_net_version != c9300_upgrade_ios_version
vars: ansible_command_timeout: 1620

rudimocnik avatar Apr 15 '21 07:04 rudimocnik

@rohitthakur2590 I also did some test to reduce the amount of tasks as low as possible. I first started to run the playbook 'test1' without Gathering Facts and then I am able to process the 'install add file ...' without any timeout issue. ansible_without_ios_facts.log

When I am running the same playbook but where I added a task 'test | Gathering Facts' that I am getting a timeout value. ansible_using_ios_facts.log

update 2021-04-29 Instead using the default Paramiko connection plugin that I tried it with the new LibSSH connection plugin and I don't have this timeout issue anymore. https://www.ansible.com/blog/new-libssh-connection-plugin-for-ansible-network

[persistent_connection]
ssh_type = libssh

ANSIBLE PLAYBOOK

---
- name: Cisco IOS-XE upgrade
  hosts: NETWORK
  gather_facts: no
  roles:
  - role: test1

ANSIBLE ROLE 'test1' - tasks/main.yml

---
# tasks file for test1
- name: test | Gathering Facts
  ansible.builtin.ios_facts:
    gather_subset: hardware
  tags:
    - installing

- name: test | Define dictionary
  ansible.builtin.set_fact:
    device_findfile_info: "{{ device_findfile_info|default({}) | 
        combine( { 'flash:' : { 'filename' : 'cat9k_lite_iosxe.16.12.05b.SPA.bin' }}) }}"
  tags:
    - installing

- name: test | debug
  debug: var=device_findfile_info
  tags:
    - installing
    
- name: "test | Add image "
  vars:
    ansible_command_timeout: 1800
  register: command_results
  cisco.ios.ios_command:
    commands:
    - command: "install add file {{ item }}{{ device_findfile_info[item]['filename'] }}\n\n"
  with_items: "{{ device_findfile_info.keys() | list }}"
  tags:
    - installing

ANSIBLE VERSION ansible 2.10.7 config file = /etc/ansible/ansible.cfg configured module search path = ['/home/deployer/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python3.6/site-packages/ansible executable location = /usr/bin/ansible python version = 3.6.7 (default, Dec 6 2018, 11:09:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)]

Collection Version ansible.netcommon 2.0.1 ansible.posix 1.2.0 ansible.utils 2.0.1 cisco.ios 2.0.0

lorephoenix avatar Apr 28 '21 13:04 lorephoenix

I found a workaround which I posted into a different issue https://github.com/ansible-collections/ansible.netcommon/issues/269#issuecomment-1102979029

You can use meta: reset_connection before and after the task you'd like to increase the timeout for.

My example task, which seemed to work

- name: Workaround to bump timeout
  meta: reset_connection

- name: Find any required upgrades for modules
  register: epld_upgrade_required
    vars:
      ansible_command_timeout: 90
    ansible.netcommon.cli_command:
      command: "show install all impact epld bootflash:{{ epld_file }} | json"

- name: Workaround, back to default timeout
  meta: reset_connection

Before using this workaround, the command would timeout at 30 seconds even though for this task I had the timeout set to 90 seconds. After this workaround, my command correctly waits longer and no longer fails.

The problem is network_cli.py will only read in the timeout variable on a new ssh connection. If it's not the first task in the playbook, and therefore you have an existing ssh connection already, the plugin will not update the command_timeout variable and continue to use the value used when the session was first established.

jknight-netscout avatar Apr 27 '22 13:04 jknight-netscout