amazon.aws icon indicating copy to clipboard operation
amazon.aws copied to clipboard

aws_ec2 inventory running against the same host multiple times

Open yamjoepobuda opened this issue 3 years ago • 2 comments

Summary

We recently bumped amazon.aws collection to 4.1.0 from 4.0.0 and started experiencing errors due to process locks and other similar errors. We found that the inventory file was being parsed incorrectly. Instead of processing each host once, it appears to be processing each host for every IP address it has, in addition to the usual tag we use to name hosts.

Issue Type

Bug Report

Component Name

aws_ec2

Ansible Version

$ ansible --version
ansible [core 2.12.7]
  config file = /etc/ansible/ansible-amer.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.10/dist-packages/ansible
  ansible collection location = /usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
  jinja version = 3.1.2
  libyaml = False

Collection Versions

+ ansible-galaxy collection list

# /usr/share/ansible/collections/ansible_collections
Collection            Version
--------------------- -------
amazon.aws            4.1.0  
ansible.netcommon     3.1.0  
ansible.posix         1.4.0  
ansible.utils         2.6.1  
ansible.windows       1.10.0 
chocolatey.chocolatey 1.3.0  
cisco.ios             3.3.0  
cisco.meraki          2.10.1 
commscope.icx         1.0.5  
community.aws         4.0.0  
community.crypto      2.4.0  
community.general     5.4.0  
community.network     4.0.1  
community.windows     1.10.0 
fortinet.fortios      2.1.6  

# /usr/local/lib/python3.10/dist-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    2.3.0  
ansible.netcommon             2.6.1  
ansible.posix                 1.4.0  
ansible.utils                 2.6.1  
ansible.windows               1.10.0 
arista.eos                    3.1.0  
awx.awx                       19.4.0 
azure.azcollection            1.13.0 
check_point.mgmt              2.3.0  
chocolatey.chocolatey         1.2.0  
cisco.aci                     2.2.0  
cisco.asa                     2.1.0  
cisco.dnac                    6.4.0  
cisco.intersight              1.0.19 
cisco.ios                     2.8.1  
cisco.iosxr                   2.9.0  
cisco.ise                     1.2.1  
cisco.meraki                  2.6.2  
cisco.mso                     1.4.0  
cisco.nso                     1.0.3  
cisco.nxos                    2.9.1  
cisco.ucs                     1.8.0  
cloud.common                  2.1.1  
cloudscale_ch.cloud           2.2.2  
community.aws                 2.5.0  
community.azure               1.1.0  
community.ciscosmb            1.0.5  
community.crypto              2.3.2  
community.digitalocean        1.19.0 
community.dns                 2.2.0  
community.docker              2.6.0  
community.fortios             1.0.0  
community.general             4.8.2  
community.google              1.0.0  
community.grafana             1.4.0  
community.hashi_vault         2.5.0  
community.hrobot              1.4.0  
community.kubernetes          2.0.1  
community.kubevirt            1.0.0  
community.libvirt             1.1.0  
community.mongodb             1.4.0  
community.mysql               2.3.8  
community.network             3.3.0  
community.okd                 2.2.0  
community.postgresql          1.7.4  
community.proxysql            1.4.0  
community.rabbitmq            1.2.1  
community.routeros            2.1.0  
community.sap                 1.0.0  
community.sap_libs            1.1.0  
community.skydive             1.0.0  
community.sops                1.2.2  
community.vmware              1.18.0 
community.windows             1.10.0 
community.zabbix              1.7.0  
containers.podman             1.9.3  
cyberark.conjur               1.1.0  
cyberark.pas                  1.0.14 
dellemc.enterprise_sonic      1.1.1  
dellemc.openmanage            4.4.0  
dellemc.os10                  1.1.1  
dellemc.os6                   1.0.7  
dellemc.os9                   1.0.4  
f5networks.f5_modules         1.17.0 
fortinet.fortimanager         2.1.5  
fortinet.fortios              2.1.6  
frr.frr                       1.0.4  
gluster.gluster               1.0.2  
google.cloud                  1.0.2  
hetzner.hcloud                1.6.0  
hpe.nimble                    1.1.4  
ibm.qradar                    1.0.3  
infinidat.infinibox           1.3.3  
infoblox.nios_modules         1.2.2  
inspur.sm                     1.3.0  
junipernetworks.junos         2.10.0 
kubernetes.core               2.3.1  
mellanox.onyx                 1.0.0  
netapp.aws                    21.7.0 
netapp.azure                  21.10.0
netapp.cloudmanager           21.17.0
netapp.elementsw              21.7.0 
netapp.ontap                  21.19.1
netapp.storagegrid            21.10.0
netapp.um_info                21.8.0 
netapp_eseries.santricity     1.3.0  
netbox.netbox                 3.7.1  
ngine_io.cloudstack           2.2.4  
ngine_io.exoscale             1.0.0  
ngine_io.vultr                1.1.1  
openstack.cloud               1.8.0  
openvswitch.openvswitch       2.1.0  
ovirt.ovirt                   1.6.6  
purestorage.flasharray        1.13.0 
purestorage.flashblade        1.9.0  
sensu.sensu_go                1.13.1 
servicenow.servicenow         1.0.6  
splunk.es                     1.0.2  
t_systems_mms.icinga_director 1.29.0 
theforeman.foreman            2.2.0  
vmware.vmware_rest            2.1.5  
vyos.vyos                     2.8.0  
wti.remote                    1.0.3  

AWS SDK versions

I originally thought this was related to boto*, as those versions also changed in our deployment at the same time as the collection. I pinned the versions below, before rolling back the amazon.aws collection.

+ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: 
Required-by: 
---
Name: boto3
Version: 1.24.42
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.27.42
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: awscli, boto3, s3transfer

Once the collection was rolled back and the issue was resolved, I unpinned boto* to validate. This is after unpinning:

+ pip show boto boto3 botocore
debug2: channel 0: written 31 to efd 6
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: 
Required-by: 
---
Name: boto3
Version: 1.24.44
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.27.44
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: awscli, boto3, s3transfer

Tl;dr 4.1.0 remains broken on both sets of boto versions above

Configuration

+ ansible-config dump --only-changed
ANSIBLE_NOCOWS(/etc/ansible/ansible-amer.cfg) = True
ANSIBLE_PIPELINING(/etc/ansible/ansible-amer.cfg) = True
CACHE_PLUGIN(/etc/ansible/ansible-amer.cfg) = redis
CACHE_PLUGIN_CONNECTION(/etc/ansible/ansible-amer.cfg) = production-a-us-west-2-ansible-redis.shared.xxxx.cloud:6379:0
CACHE_PLUGIN_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 86400
CALLBACKS_ENABLED(/etc/ansible/ansible-amer.cfg) = ['datadog_callback']
COLLECTIONS_PATHS(/etc/ansible/ansible-amer.cfg) = ['/usr/share/ansible/collections']
DEFAULT_FORKS(/etc/ansible/ansible-amer.cfg) = 50
DEFAULT_GATHER_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 20
DEFAULT_POLL_INTERVAL(/etc/ansible/ansible-amer.cfg) = 5
DEFAULT_REMOTE_USER(/etc/ansible/ansible-amer.cfg) = ansible
DEFAULT_ROLES_PATH(/etc/ansible/ansible-amer.cfg) = ['/etc/ansible/roles']
DEFAULT_STRATEGY_PLUGIN_PATH(/etc/ansible/ansible-amer.cfg) = ['/usr/local/lib/python3.10/dist-packages/ansible_mitogen/plugins/strategy']
DEFAULT_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 20
DISPLAY_SKIPPED_HOSTS(/etc/ansible/ansible-amer.cfg) = False
HOST_KEY_CHECKING(/etc/ansible/ansible-amer.cfg) = False
PERSISTENT_COMMAND_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 30
PERSISTENT_CONNECT_TIMEOUT(/etc/ansible/ansible-amer.cfg) = 40
RETRY_FILES_ENABLED(/etc/ansible/ansible-amer.cfg) = False

OS / Environment

+ uname -a
Linux 565837f08633 5.11.0-1028-aws #31~20.04.1-Ubuntu SMP Fri Jan 14 14:37:50 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
+ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Steps to Reproduce

We can reproduce this issue by using version 4.1.0 of the collection in requirements.yml

collections:
  - name: amazon.aws
    version: 4.1.0

Expected Results

How it looked in 4.0.0:

TASK [Gathering Facts] *********************************************************
ok: [production-a-activedirectory-aws2-us-west-2-10.2x2.1xx.x4]
ok: [production-a-activedirectory-aws1-us-west-2-10.xx.xx.xx]

Actual Results

How it looks in 4.1.0:

TASK [Gathering Facts] *********************************************************
ok: [production-a-activedirectory-aws2-us-west-2-10.2x2.1xx.x4]
ok: [ec2-35-x65-x0-xx3.us-west-2.compute.amazonaws.com]
ok: [ip-10-2x2-1xx-x4.us-west-2.compute.internal]
ok: [ec2-34-x16-x5-9x.us-west-2.compute.amazonaws.com]
ok: [ip-10-2x2-1xx-x45.us-west-2.compute.internal]
ok: [production-a-activedirectory-aws1-us-west-2-10.xx.xx.xx]

This results in process locks as each task is being run on the host 3 times. Certain tasks require dedicated locks, those that do fail similarly to the example below:

 FAILED! => {"changed": true, "cmd": "Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02.xml", "delta": "0:00:03.209783", "end": "2022-08-02 18:32:04.252882", "msg": "non-zero return code", "rc": 1, "start": "2022-08-02 18:32:01.043099", "stderr": "Export-NpsConfiguration : The process cannot access the file because it is being used by another process. (Exception \r\nfrom HRESULT: 0x80070020)\r\nAt line:1 char:65\r\n+ ... ing $false; Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02 ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (Microsoft.Nps.C...gurationCommand:ExportNpsConfigurationCommand) [Export- \r\n   NpsConfiguration], FileLoadException\r\n    + FullyQualifiedErrorId : Export-NpsConfiguration.FileLoadException,Microsoft.Nps.Commands.ExportNpsConfigurationC \r\n   ommand", "stderr_lines": ["Export-NpsConfiguration : The process cannot access the file because it is being used by another process. (Exception ", "from HRESULT: 0x80070020)", "At line:1 char:65", "+ ... ing $false; Export-NpsConfiguration -Path C:\\NPS\\NPSConfig_2022-08-02 ...", "+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~", "    + CategoryInfo          : NotSpecified: (Microsoft.Nps.C...gurationCommand:ExportNpsConfigurationCommand) [Export- ", "   NpsConfiguration], FileLoadException", "    + FullyQualifiedErrorId : Export-NpsConfiguration.FileLoadException,Microsoft.Nps.Commands.ExportNpsConfigurationC ", "   ommand"], "stdout": "", "stdout_lines": []}

Code of Conduct

  • [X] I agree to follow the Ansible Code of Conduct

yamjoepobuda avatar Aug 03 '22 15:08 yamjoepobuda

Possibly relevant, our aws_ec2.yml:

---
plugin: amazon.aws.aws_ec2
regions:
  - us-west-2
  - us-east-1
  - us-east-2
filters:
  instance-state-name: running
hostnames:
  - tag:ExtendedName
  - dns-name
  - private-dns-name
strict: False
keyed_groups:
  - key: tags.region
    prefix: tag_region
  - key: tags.Business
    prefix: tag_Business
  - key: tags.Environment
    prefix: tag_Environment
  - key: tags.ExtendedName
    prefix: tag_extendedname
  - key: tags.Service
    prefix: tag_Service
  - key: tags.Component
    prefix: tag_Component
  - key: tags.ADDomain
    prefix: tag_ADDomain
  - key: tags.DNS
    prefix: tag_DNS
compose:
  ansible_host: private_ip_address
...

yamjoepobuda avatar Aug 03 '22 15:08 yamjoepobuda

We are also seeing this exact issue.

gregharvey avatar Aug 10 '22 15:08 gregharvey

This PR should resolve the problem: https://github.com/ansible-collections/amazon.aws/pull/1026 It introduces a new configuration key called allow_duplicated_hosts that default to False. False is the old behaviour, True, the new one. Can you please give it a try?

The PR is still marked a WIP (WorkInProgress) because I need to refresh the functional tests.

goneri avatar Sep 16 '22 19:09 goneri