community.general icon indicating copy to clipboard operation
community.general copied to clipboard

Proxmox LXC Create Container runs into timeout

Open miccico opened this issue 1 year ago • 8 comments

Summary

When running against a rather loaded proxmox cluster i sometimes encounter the following issue:

fatal: [vihwaf03]: FAILED! => {"changed": false, "msg": "Pre-creation checks of lxc VM 893 failed with exception: HTTPSConnectionPool(host='xxxxxxxxxxxxx', port=8006): Read timed out. (read timeout=5)"}

After some debugging i noticed that proxmoxer internally uses a 5 second timeout for the operations which seems to be insufficient for the creation under some load conditions. After investigating the code I saw that the timeout value provided to the ansible task is not handed over to the proxmoxer class. I would request to pass along the timeout (or a share of the timeout) provided to the ansible task to the proxmoxer class.

Issue Type

Bug Report

Component Name

proxmox

Ansible Version

$ ansible --version
ansible [core 2.14.4]
  config file = /opt/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general

# /root/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
community.general 7.1.0

Configuration

$ ansible-config dump --only-changed

CONFIG_FILE() = /opt/ansible/ansible.cfg
DEFAULT_HOST_LIST(/opt/ansible/ansible.cfg) = ['/opt/ansible/inventory.yml']

OS / Environment

Debian 11.7 running in LXC container

Steps to Reproduce

Have a very sad and slow proxmox cluster and try to create a container while having bad luck

- name: Create container
  throttle: 1
  proxmox:
    vmid: "{{vmid}}"
    node: "{{ proxmox_node if proxmox_node is defined else omit}}"
    api_host: "{{ proxmox_api_host }}"
    api_user: "{{ proxmox_api_user }}"
    api_password: "{{ proxmox_api_password }}"
    password: "{{rootpassword}}"
    hostname: "{{inventory_hostname}}"
    ostemplate: "{{lxc_template}}"
    unprivileged: "{{ lxc_unprivileged if lxc_unprivileged is defined else true}}"
    pool: AnsibleManaged
    description: Created by ansible playbook
    tags: ansible
    #mounts: '{"mp0":"local:8,mp=/mnt/test/"}'
    netif:
      net0: "{{ lookup('template', 'netif0.j2') if lxc_net0_ip is defined else omit }}"
      net1: "{{ lookup('template', 'netif1.j2') if lxc_net1_ip is defined else omit }}"
      net2: "{{ lookup('template', 'netif2.j2') if lxc_net2_ip is defined else omit }}"
    cores: "{{ lxc_cores if lxc_cores is defined else '2'}}"
    #order: "{{ lxc_startuporder if lxc_startuporder is defined else omit}}"
    cpus: "{{ lxc_cpus if lxc_cpus is defined else omit}}"
    storage: "{{ lxc_storage if lxc_storage is defined else 'local'}}"
    disk: "{{ lxc_storage if lxc_storage is defined else 'local'}}:{{ lxc_storage_size if lxc_storage_size is defined else '2'}}"
    onboot: yes
    memory: "{{ lxc_memory if lxc_memory is defined else '128'}}"
    swap: "{{ lxc_swap if lxc_swap is defined else '128'}}"
    pubkey: '{{ lookup("ansible.builtin.file", "~/.ssh/id_rsa.pub") }}'
    #state: started
    timeout: 500

Expected Results

Execution of the creation without error

Actual Results

fatal: [vihwaf03]: FAILED! => {"changed": false, "msg": "Pre-creation checks of lxc VM 893 failed with exception: HTTPSConnectionPool(host='xxxxxxxxxxxxx', port=8006): Read timed out. (read timeout=5)"}

Code of Conduct

  • [X] I agree to follow the Ansible Code of Conduct

miccico avatar Jul 03 '23 17:07 miccico

Files identified in the description:

  • plugins/modules/proxmox

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jul 03 '23 17:07 ansibullbot

cc @Ajpantuso @Thulium-Drake @UnderGreen @joshainglis @karmab @tleguern click here for bot help

ansibullbot avatar Jul 03 '23 17:07 ansibullbot

Hi @miccico , thanks for the bug report!

I am working on a PR here, and a coupla disclaimers:

  • The code that interacts with proxmoxer is shared amongst all the proxmox modules
  • Not all those modules have a timeout parameter, the ones that do are: proxmox_disk, proxmox_kvm, proxmox_snap, proxmox_template, proxmox

russoz avatar Jul 06 '23 20:07 russoz

cc @krauthosting click here for bot help

ansibullbot avatar Feb 16 '24 00:02 ansibullbot

@l00ptr Could you review this? This looks good to us, just changing the timeout to a longer value through module args shouldn't have any adverse effect? Checking the Proxmoxer source code mentioned in the original issue shows that it already has a default timeout of 5s set for the HTTP backend, and the parameter is ignored for other backends: https://github.com/proxmoxer/proxmoxer/blob/d7419ab03628aa7cbcf24e90f981446996b91100/proxmoxer/backends/https.py#L261

@russoz What you mean by polling the API here? At least in @UnderGreens example, the new timeout should only affect the duration of the actual API calls itself (such as self.api_task_ok), but leaves the overall polling mechanism intact? Please add still a changelog fragment and merge, so we can move the proxmox modules to the new community.proxmox asap :hugs:

krauthosting avatar Feb 22 '24 17:02 krauthosting

Hi @krauthosting , that was quite a while ago and I have not returned to this issue ever since. I think it is safe to ignore my comments from those days and move forward to a new solution.

russoz avatar Feb 26 '24 20:02 russoz

@russoz Thanks for confirming and please reopen in the community.proxmox collection :hugs: @felixfontein Can you please close this issue?

krauthosting avatar Feb 27 '24 09:02 krauthosting

@krauthosting why close + reopen instead of transfer the issue?

felixfontein avatar Feb 28 '24 06:02 felixfontein