amazon.aws
amazon.aws copied to clipboard
aws_ec2 inventory running against the same host multiple times
Summary
We recently bumped amazon.aws collection to 4.1.0 from 4.0.0 and started experiencing errors due to process locks and other similar errors. We found that the inventory file was being parsed incorrectly. Instead of processing each host once, it appears to be processing each host for every IP address it has, in addition to the usual tag we use to name hosts.
Issue Type
Bug Report
Component Name
aws_ec2
Ansible Version
$ ansible --version
ansible [core 2.12.7]
config file = /etc/ansible/ansible-amer.cfg
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
ansible collection location = /usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
jinja version = 3.1.2
libyaml = False
Collection Versions
+ ansible-galaxy collection list
# /usr/share/ansible/collections/ansible_collections
Collection Version
--------------------- -------
amazon.aws 4.1.0
ansible.netcommon 3.1.0
ansible.posix 1.4.0
ansible.utils 2.6.1
ansible.windows 1.10.0
chocolatey.chocolatey 1.3.0
cisco.ios 3.3.0
cisco.meraki 2.10.1
commscope.icx 1.0.5
community.aws 4.0.0
community.crypto 2.4.0
community.general 5.4.0
community.network 4.0.1
community.windows 1.10.0
fortinet.fortios 2.1.6
# /usr/local/lib/python3.10/dist-packages/ansible_collections
Collection Version
----------------------------- -------
amazon.aws 2.3.0
ansible.netcommon 2.6.1
ansible.posix 1.4.0
ansible.utils 2.6.1
ansible.windows 1.10.0
arista.eos 3.1.0
awx.awx 19.4.0
azure.azcollection 1.13.0
check_point.mgmt 2.3.0
chocolatey.chocolatey 1.2.0
cisco.aci 2.2.0
cisco.asa 2.1.0
cisco.dnac 6.4.0
cisco.intersight 1.0.19
cisco.ios 2.8.1
cisco.iosxr 2.9.0
cisco.ise 1.2.1
cisco.meraki 2.6.2
cisco.mso 1.4.0
cisco.nso 1.0.3
cisco.nxos 2.9.1
cisco.ucs 1.8.0
cloud.common 2.1.1
cloudscale_ch.cloud 2.2.2
community.aws 2.5.0
community.azure 1.1.0
community.ciscosmb 1.0.5
community.crypto 2.3.2
community.digitalocean 1.19.0
community.dns 2.2.0
community.docker 2.6.0
community.fortios 1.0.0
community.general 4.8.2
community.google 1.0.0
community.grafana 1.4.0
community.hashi_vault 2.5.0
community.hrobot 1.4.0
community.kubernetes 2.0.1
community.kubevirt 1.0.0
community.libvirt 1.1.0
community.mongodb 1.4.0
community.mysql 2.3.8
community.network 3.3.0
community.okd 2.2.0
community.postgresql 1.7.4
community.proxysql 1.4.0
community.rabbitmq 1.2.1
community.routeros 2.1.0
community.sap 1.0.0
community.sap_libs 1.1.0
community.skydive 1.0.0
community.sops 1.2.2
community.vmware 1.18.0
community.windows 1.10.0
community.zabbix 1.7.0
containers.podman 1.9.3
cyberark.conjur 1.1.0
cyberark.pas 1.0.14
dellemc.enterprise_sonic 1.1.1
dellemc.openmanage 4.4.0
dellemc.os10 1.1.1
dellemc.os6 1.0.7
dellemc.os9 1.0.4
f5networks.f5_modules 1.17.0
fortinet.fortimanager 2.1.5
fortinet.fortios 2.1.6
frr.frr 1.0.4
gluster.gluster 1.0.2
google.cloud 1.0.2
hetzner.hcloud 1.6.0
hpe.nimble 1.1.4
ibm.qradar 1.0.3
infinidat.infinibox 1.3.3
infoblox.nios_modules 1.2.2
inspur.sm 1.3.0
junipernetworks.junos 2.10.0
kubernetes.core 2.3.1
mellanox.onyx 1.0.0
netapp.aws 21.7.0
netapp.azure 21.10.0
netapp.cloudmanager 21.17.0
netapp.elementsw 21.7.0
netapp.ontap 21.19.1
netapp.storagegrid 21.10.0
netapp.um_info 21.8.0
netapp_eseries.santricity 1.3.0
netbox.netbox 3.7.1
ngine_io.cloudstack 2.2.4
ngine_io.exoscale 1.0.0
ngine_io.vultr 1.1.1
openstack.cloud 1.8.0
openvswitch.openvswitch 2.1.0
ovirt.ovirt 1.6.6
purestorage.flasharray 1.13.0
purestorage.flashblade 1.9.0
sensu.sensu_go 1.13.1
servicenow.servicenow 1.0.6
splunk.es 1.0.2
t_systems_mms.icinga_director 1.29.0
theforeman.foreman 2.2.0
vmware.vmware_rest 2.1.5
vyos.vyos 2.8.0
wti.remote 1.0.3
AWS SDK versions
I originally thought this was related to boto*, as those versions also changed in our deployment at the same time as the collection. I pinned the versions below, before rolling back the amazon.aws collection.
+ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.24.42
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: botocore, jmespath, s3transfer
Required-by:
---
Name: botocore
Version: 1.27.42
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: awscli, boto3, s3transfer
Once the collection was rolled back and the issue was resolved, I unpinned boto* to validate. This is after unpinning:
+ pip show boto boto3 botocore
debug2: channel 0: written 31 to efd 6
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.24.44
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: botocore, jmespath, s3transfer
Required-by:
---
Name: botocore
Version: 1.27.44
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: awscli, boto3, s3transfer
Tl;dr 4.1.0 remains broken on both sets of boto versions above
Configuration
+ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/etc/ansible/ansible-amer.cfg) = True
ANSIBLE_PIPELINING(/etc/ansible/ansible-amer.cfg) = True
CACHE_PLUGIN(/etc/ansible/ansible-amer.cfg) = redis
CACHE_PLUGIN_CONNECTION(/etc/ansible/ansible-amer.cfg) = production-a-us-west-2-ansible-redis.shared.xxxx.cloud:6379:0
CACHE_PLUGIN_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 86400
CALLBACKS_ENABLED(/etc/ansible/ansible-amer.cfg) = ['datadog_callback']
COLLECTIONS_PATHS(/etc/ansible/ansible-amer.cfg) = ['/usr/share/ansible/collections']
DEFAULT_FORKS(/etc/ansible/ansible-amer.cfg) = 50
DEFAULT_GATHER_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 20
DEFAULT_POLL_INTERVAL(/etc/ansible/ansible-amer.cfg) = 5
DEFAULT_REMOTE_USER(/etc/ansible/ansible-amer.cfg) = ansible
DEFAULT_ROLES_PATH(/etc/ansible/ansible-amer.cfg) = ['/etc/ansible/roles']
DEFAULT_STRATEGY_PLUGIN_PATH(/etc/ansible/ansible-amer.cfg) = ['/usr/local/lib/python3.10/dist-packages/ansible_mitogen/plugins/strategy']
DEFAULT_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 20
DISPLAY_SKIPPED_HOSTS(/etc/ansible/ansible-amer.cfg) = False
HOST_KEY_CHECKING(/etc/ansible/ansible-amer.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 40
RETRY_FILES_ENABLED(/etc/ansible/ansible-amer.cfg) = False
OS / Environment
+ uname -a
Linux 565837f08633 5.11.0-1028-aws #31~20.04.1-Ubuntu SMP Fri Jan 14 14:37:50 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
+ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Steps to Reproduce
We can reproduce this issue by using version 4.1.0 of the collection in requirements.yml
collections:
- name: amazon.aws
version: 4.1.0
Expected Results
How it looked in 4.0.0:
TASK [Gathering Facts] *********************************************************
ok: [production-a-activedirectory-aws2-us-west-2-10.2x2.1xx.x4]
ok: [production-a-activedirectory-aws1-us-west-2-10.xx.xx.xx]
Actual Results
How it looks in 4.1.0:
TASK [Gathering Facts] *********************************************************
ok: [production-a-activedirectory-aws2-us-west-2-10.2x2.1xx.x4]
ok: [ec2-35-x65-x0-xx3.us-west-2.compute.amazonaws.com]
ok: [ip-10-2x2-1xx-x4.us-west-2.compute.internal]
ok: [ec2-34-x16-x5-9x.us-west-2.compute.amazonaws.com]
ok: [ip-10-2x2-1xx-x45.us-west-2.compute.internal]
ok: [production-a-activedirectory-aws1-us-west-2-10.xx.xx.xx]
This results in process locks as each task is being run on the host 3 times. Certain tasks require dedicated locks, those that do fail similarly to the example below:
FAILED! => {"changed": true, "cmd": "Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02.xml", "delta": "0:00:03.209783", "end": "2022-08-02 18:32:04.252882", "msg": "non-zero return code", "rc": 1, "start": "2022-08-02 18:32:01.043099", "stderr": "Export-NpsConfiguration : The process cannot access the file because it is being used by another process. (Exception \r\nfrom HRESULT: 0x80070020)\r\nAt line:1 char:65\r\n+ ... ing $false; Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02 ...\r\n+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n + CategoryInfo : NotSpecified: (Microsoft.Nps.C...gurationCommand:ExportNpsConfigurationCommand) [Export- \r\n NpsConfiguration], FileLoadException\r\n + FullyQualifiedErrorId : Export-NpsConfiguration.FileLoadException,Microsoft.Nps.Commands.ExportNpsConfigurationC \r\n ommand", "stderr_lines": ["Export-NpsConfiguration : The process cannot access the file because it is being used by another process. (Exception ", "from HRESULT: 0x80070020)", "At line:1 char:65", "+ ... ing $false; Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02 ...", "+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~", " + CategoryInfo : NotSpecified: (Microsoft.Nps.C...gurationCommand:ExportNpsConfigurationCommand) [Export- ", " NpsConfiguration], FileLoadException", " + FullyQualifiedErrorId : Export-NpsConfiguration.FileLoadException,Microsoft.Nps.Commands.ExportNpsConfigurationC ", " ommand"], "stdout": "", "stdout_lines": []}
Code of Conduct
- [X] I agree to follow the Ansible Code of Conduct
Possibly relevant, our aws_ec2.yml:
---
plugin: amazon.aws.aws_ec2
regions:
- us-west-2
- us-east-1
- us-east-2
filters:
instance-state-name: running
hostnames:
- tag:ExtendedName
- dns-name
- private-dns-name
strict: False
keyed_groups:
- key: tags.region
prefix: tag_region
- key: tags.Business
prefix: tag_Business
- key: tags.Environment
prefix: tag_Environment
- key: tags.ExtendedName
prefix: tag_extendedname
- key: tags.Service
prefix: tag_Service
- key: tags.Component
prefix: tag_Component
- key: tags.ADDomain
prefix: tag_ADDomain
- key: tags.DNS
prefix: tag_DNS
compose:
ansible_host: private_ip_address
...
We are also seeing this exact issue.
This PR should resolve the problem: https://github.com/ansible-collections/amazon.aws/pull/1026
It introduces a new configuration key called allow_duplicated_hosts that default to False. False is the old behaviour, True, the new one.
Can you please give it a try?
The PR is still marked a WIP (WorkInProgress) because I need to refresh the functional tests.