community.general
community.general copied to clipboard
`proxmox_kvm` - race condition when creating multiple VMs at the same time
Summary
When using community.general.proxmox_kvm
to create multiple VMs at once, only one VM will always succeed, the others might randomly fail to be created due to the vmid
being re-used:
creation of qemu VM ansible-vm-test4 with vmid 108 failed with exception=500 Internal Server Error: unable to create VM 108 - VM 108 already exists on node '[REDACTED]'
Issue Type
Bug Report
Component Name
proxmox_kvm
Ansible Version
$ ansible --version
ansible [core 2.15.11]
config file = None
configured module search path = ['/tmp/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
ansible collection location = /tmp/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.9.18 (main, Jan 4 2024, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/usr/bin/python3)
jinja version = 3.1.3
libyaml = True
Community.general Version
$ ansible-galaxy collection list community.general
Configuration
$ ansible-config dump --only-changed
# /usr/share/ansible/collections/ansible_collections
Collection Version
----------------- -------
community.general 8.6.0
OS / Environment
RedHat UBI9.3 Container as Execution Environment Proxmox 8.1.4 as target
Steps to Reproduce
When trying to create multiple Proxmox KVM VMs at the same time, only the creation of 1 VM will reliably succeed. All subsequent VMs fail in most cases, sometimes a 2nd or 3rd one can be created.
hosts.yml
systems:
vars:
systems_proxmox_host: some-proxmox-host:8006
systems_proxmox_node: some-cluster-node
systems_proxmox_username: some-local-user@pam
systems_proxmox_password: Hunter2
hosts:
ansible-vm-test1:
ansible-vm-test2:
ansible-vm-test3:
ansible-vm-test4:
playbook.yml
---
- name: Manage Workstation VMs
hosts: systems
# will be done explicitly once the VM exists
gather_facts: false
tasks:
- name: Get information about existing VM
community.general.proxmox_vm_info:
api_host: "{{ systems_proxmox_host }}"
# FIXME: use a proper/trusted cert
validate_certs: false
api_user: "{{ systems_proxmox_username }}"
api_password: "{{ systems_proxmox_password }}"
name: "{{ inventory_hostname_short }}"
register: vm_info
delegate_to: localhost
- name: Show VM information
ansible.builtin.debug:
var: vm_info.proxmox_vms
delegate_to: localhost
- name: Stop processing hosts, where there is a non-unique match of VMs (same name exists multiple times)
ansible.builtin.assert:
that:
- vm_info.proxmox_vms | length < 2
fail_msg: Not processing VM '{{ inventory_hostname_short }}', since one or more duplicates were found and there's no way to distinguish them
success_msg: Continuing to process VM '{{ inventory_hostname_short }}', no duplicates were found that could cause uniqueness issues
delegate_to: localhost
- name: Create a VM
community.general.proxmox_kvm:
api_host: "{{ systems_proxmox_host }}"
# FIXME: use a proper/trusted cert
validate_certs: false
api_user: "{{ systems_proxmox_username }}"
api_password: "{{ systems_proxmox_password }}"
name: "{{ inventory_hostname_short }}"
machine: q35
memory: 10000
ostype: l26
# TODO: make this dynamic
node: "{{ systems_proxmox_node }}"
delegate_to: localhost
Execution: ansible-playbook -vv -i hosts.yml playbook.yml
Expected Results
I expected community.general.proxmox_kvm
to be able to create multiple VMs at once without failing.
Actual Results
This is caused by:
-
community.general.proxmox_kvm
having to provide thevmid
to Provmox VE, because it can't determine it on its own during VM creation, it's a required parameter -
community.general.proxmox_kvm
determining the next availablevmid
at the same time as the other instances ofcommunity.general.proxmox_kvm
processing this task
TASK [Create a VM] ****************************************************************************************************************************************
task path: /runner/playbook_systems.yml:41
fatal: [ansible-vm-test4 -> localhost]: FAILED! => {"changed": false, "msg": "creation of qemu VM ansible-vm-test4 with vmid 108 failed with exception=500 Internal Server Error: unable to create VM 108 - VM 108 already exists on node '[REDACTED]'", "vmid": "108"}
fatal: [ansible-vm-test1 -> localhost]: FAILED! => {"changed": false, "msg": "creation of qemu VM ansible-vm-test1 with vmid 108 failed with exception=500 Internal Server Error: unable to create VM 108 - VM 108 already exists on node '[REDACTED]'", "vmid": "108"}
changed: [ansible-vm-test2 -> localhost] => {"changed": true, "devices": {}, "mac": {}, "msg": "VM ansible-vm-test2 with vmid 108 deployed", "vmid": 108}
changed: [ansible-vm-test3 -> localhost] => {"changed": true, "devices": {}, "mac": {}, "msg": "VM ansible-vm-test3 with vmid 109 deployed", "vmid": 109}
Since vmid
is a required parameter of the PVE API, it needs to be determined by the API client (e.g. in contrast to the vSphere VMOMI API, where the vmoid
is generated by the server upon VM creation). This introduces a race condition in our scenario, where multiple processes creating VMs at the same time fight within the window of "determine vmid
by querying PVE API for the next available one" and "create VM using the vmid
" to be the first one.
The only reasonable fix, is to catch the API response and retry on a failure with another vmid
until the VM creation succeeds.
A workaround for users for now is to set throttle: 1
for the corresponding task to prevent multiple processes to run at the same time:
- name: Create a VM
community.general.proxmox_kvm:
api_host: "{{ systems_proxmox_host }}"
# FIXME: use a proper/trusted cert
validate_certs: false
api_user: "{{ systems_proxmox_username }}"
api_password: "{{ systems_proxmox_password }}"
name: "{{ inventory_hostname_short }}"
machine: q35
memory: 10000
ostype: l26
# TODO: make this dynamic
node: "{{ systems_proxmox_node }}"
delegate_to: localhost
throttle: 1
Code of Conduct
- [X] I agree to follow the Ansible Code of Conduct
Files identified in the description:
If these files are incorrect, please update the component name
section of the description or use the !component
bot command.
cc @Ajpantuso @Thulium-Drake @UnderGreen @helldorado @joshainglis @karmab @krauthosting click here for bot help
(we had a discussion on this before @eliasp created this issue in #devel:ansible.com on Matrix; if someone is interested in the discussion, the logs for that room are public IIRC)