community.aws icon indicating copy to clipboard operation
community.aws copied to clipboard

ansible_connection: aws_ssm fails when KMS encryption is enabled for SSM transport general prefs.

Open bedge opened this issue 2 years ago • 7 comments

Summary

With the AWS systems manager preferences set with KMS encryption disabled, the:

    ansible_connection: aws_ssm

works

With KMS encryption enabled, it fails

Issue Type

Bug Report

Component Name

ec2_ssm

Ansible Version

ansible [core 2.11.3]
  config file = /Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg
  configured module search path = ['/Users/edgeb1/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages/ansible
  ansible collection location = /Users/edgeb1/.ansible/collections:/usr/share/ansible/collections
  executable location = /Users/edgeb1/.pyenv/versions/3.9.0/bin/ansible
  python version = 3.9.0 (default, Dec  9 2020, 10:07:40) [Clang 12.0.0 (clang-1200.0.32.27)]
  jinja version = 3.0.1
  libyaml = True

Collection Versions

➜ ansible-galaxy collection list

# /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages/ansible_collections
Collection                    Version
----------------------------- -------
amazon.aws                    1.5.0
ansible.netcommon             2.2.0
ansible.posix                 1.2.0
ansible.utils                 2.3.0
ansible.windows               1.7.0
arista.eos                    2.2.0
awx.awx                       19.2.2
azure.azcollection            1.7.0
check_point.mgmt              2.0.0
chocolatey.chocolatey         1.1.0
cisco.aci                     2.0.0
cisco.asa                     2.0.2
cisco.intersight              1.0.15
cisco.ios                     2.3.0
cisco.iosxr                   2.3.0
cisco.meraki                  2.4.2
cisco.mso                     1.2.0
cisco.nso                     1.0.3
cisco.nxos                    2.4.0
cisco.ucs                     1.6.0
cloudscale_ch.cloud           2.2.0
community.aws                 1.5.0
community.azure               1.0.0
community.crypto              1.7.1
community.digitalocean        1.8.0
community.docker              1.8.0
community.fortios             1.0.0
community.general             3.4.0
community.google              1.0.0
community.grafana             1.2.1
community.hashi_vault         1.3.2
community.hrobot              1.1.1
community.kubernetes          1.2.1
community.kubevirt            1.0.0
community.libvirt             1.0.1
community.mongodb             1.2.1
community.mysql               2.1.0
community.network             3.0.0
community.okd                 1.1.2
community.postgresql          1.4.0
community.proxysql            1.0.0
community.rabbitmq            1.0.3
community.routeros            1.2.0
community.skydive             1.0.0
community.sops                1.1.0
community.vmware              1.12.0
community.windows             1.5.0
community.zabbix              1.4.0
containers.podman             1.6.1
cyberark.conjur               1.1.0
cyberark.pas                  1.0.7
dellemc.enterprise_sonic      1.1.0
dellemc.openmanage            3.5.0
dellemc.os10                  1.1.1
dellemc.os6                   1.0.7
dellemc.os9                   1.0.4
f5networks.f5_modules         1.10.1
fortinet.fortimanager         2.1.3
fortinet.fortios              2.1.2
frr.frr                       1.0.3
gluster.gluster               1.0.1
google.cloud                  1.0.2
hetzner.hcloud                1.4.4
hpe.nimble                    1.1.3
ibm.qradar                    1.0.3
infinidat.infinibox           1.2.4
inspur.sm                     1.2.0
junipernetworks.junos         2.3.0
kubernetes.core               1.2.1
mellanox.onyx                 1.0.0
netapp.aws                    21.6.0
netapp.azure                  21.8.1
netapp.cloudmanager           21.8.0
netapp.elementsw              21.6.1
netapp.ontap                  21.8.1
netapp.um_info                21.7.0
netapp_eseries.santricity     1.2.13
netbox.netbox                 3.1.1
ngine_io.cloudstack           2.1.0
ngine_io.exoscale             1.0.0
ngine_io.vultr                1.1.0
openstack.cloud               1.5.0
openvswitch.openvswitch       2.0.0
ovirt.ovirt                   1.5.3
purestorage.flasharray        1.9.0
purestorage.flashblade        1.6.0
sensu.sensu_go                1.11.1
servicenow.servicenow         1.0.6
splunk.es                     1.0.2
t_systems_mms.icinga_director 1.20.0
theforeman.foreman            2.1.2
vyos.vyos                     2.4.0
wti.remote                    1.0.1

# /Users/edgeb1/.ansible/collections/ansible_collections
Collection    Version
------------- -------
amazon.aws    1.4.1
community.aws 1.4.0

AWS SDK versions

➜ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.18.14
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: jmespath, s3transfer, botocore
Required-by: navify-aws-sso-login, aws-ssm-copy
---
Name: botocore
Version: 1.21.14
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: jmespath, urllib3, python-dateutil
Required-by: s3transfer, boto3

Configuration

 ➜  ansible-config dump --only-changed

HOST_KEY_CHECKING(/Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg) = False
INVENTORY_ENABLED(/Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg) = ['aws_ec2']

OS / Environment

osx cataina: 10.15.7 (19H1323)

Steps to Reproduce

---
- name: Test command
  gather_facts: false
  hosts: all
  vars:
#    ansible_connection: ssh
    ansible_connection: aws_ssm
    ansible_aws_ssm_region: eu-central-1
    ansible_aws_ssm_bucket_name: nghc-sbox2-s3
    ansible_python_interpreter: /opt/venv/root/bin/python


  tasks:
    - name: test
      command:
        cmd: hostname

Expected Results

[I] ➜ ansible-playbook -i inventory_aws_ec2.yml --limit nghc-sbox2-bastion test.yml -v Using /Users/edgeb1/git/xxx/operations.edgeb1/ansible/playbooks-test/ansible.cfg as config file

PLAY [Test command] **************************************************************************************************************************************************************

TASK [test] ********************************************************************************************************************************************************************** changed: [nghc-sbox2-bastion] => {"changed": true, "cmd": ["hostname"], "delta": "0:00:00.002350", "end": "2021-08-11 16:29:45.231283", "rc": 0, "start": "2021-08-11 16:29:45.228 933", "stderr": "", "stderr_lines": [], "stdout": "nghc-sbox2-bastion", "stdout_lines": ["nghc-sbox2-bastion"]}

PLAY RECAP *********************************************************************************************************************************************************************** nghc-sbox2-bastion : ok=1 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Actual Results

<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line: Starting session with SessionId: [email protected]
<i-0c208bc6d31fa6bf1> EXEC remaining: 60
<i-0c208bc6d31fa6bf1> EXEC remaining: 59
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line: SessionId: [email protected] :
<i-0c208bc6d31fa6bf1> EXEC stdout line: ----------ERROR-------
<i-0c208bc6d31fa6bf1> EXEC stdout line: Encountered error while initiating handshake. Fetching data key failed: Unable to retrieve data key, Error when decrypting data key Access
DeniedException: The ciphertext refers to a customer master key that does not exist, does not exist in this region, or you are not allowed to access.
<i-0c208bc6d31fa6bf1> EXEC stdout line:         status code: 400, request id: 58bbffdd-0094-48aa-93cd-be23a3b831ee
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> EXEC stdout line:
<i-0c208bc6d31fa6bf1> ssm_retry: attempt: 0, caught exception(local variable 'returncode' referenced before assignment) from cmd (echo ~...), pausing for 0 seconds
<i-0c208bc6d31fa6bf1> CLOSING SSM CONNECTION TO: i-0c208bc6d31fa6bf1
<i-0c208bc6d31fa6bf1> TERMINATE SSM SESSION: [email protected]
<i-0c208bc6d31fa6bf1> ESTABLISH SSM CONNECTION TO: i-0c208bc6d31fa6bf1
<i-0c208bc6d31fa6bf1> SSM COMMAND: ['/usr/local/bin/session-manager-plugin', '{"SessionId": "[email protected]", "TokenValue": "AAEAARDh8M+i84KEitQgO7pZJfHRh
DXqcZRSggoX0JKknSdkAAAAAGET/1E7DBcbgdPSh4ResepBVh32nlZADVLLlyxsu/LuIjrrZ+5b+eYquv8dU3treK4QQfREd6gPaeU0hPSfRDsVTnz3CakOcLBOcku4oQ4glZE+pRIlhggAB+ozaJSp9rBlGSvDlGkRxeVuulP3HHseObp
BKMecV6GvPmtbqH9FLcXYALS0rqLPrEVpzHBWH9Tds2fzF1buQSTdTBQKRTchxSvEq/BKm0qdGU743Gpox5nXJ6eBVoZ67fH4hesI9LVG67av7oFZJrqpngKBctTeZKgcfi2X4XZDgKhMo9iHTlygf6mvgETDAUe09yVc/+Ww3R077bt/t
JNlKiBxfRbsY9w9rb9vycziX03SzLHFZDZUBAgWw66+jHp+0epTagTn44g=", "StreamUrl": "wss://ssmmessages.eu-central-1.amazonaws.com/v1/data-channel/[email protected]?ro
le=publish_subscribe", "ResponseMetadata": {"RequestId": "dd282e11-3b94-4ba6-81d3-ea1d5169fb95", "HTTPStatusCode": 200, "HTTPHeaders": {"server": "Server", "date": "Wed, 11 Aug 2
021 16:48:17 GMT", "content-type": "application/x-amz-json-1.1", "content-length": "651", "connection": "keep-alive", "x-amzn-requestid": "dd282e11-3b94-4ba6-81d3-ea1d5169fb95"},
 "RetryAttempts": 0}}', 'eu-central-1', 'StartSession', '', '{"Target": "i-0c208bc6d31fa6bf1"}', 'https://ssm.eu-central-1.amazonaws.com']
<i-0c208bc6d31fa6bf1> SSM CONNECTION ID: [email protected]
<i-0c208bc6d31fa6bf1> EXEC echo ~
<i-0c208bc6d31fa6bf1> _wrap_command: 'echo lHlPljXCIRJbmvvsKCJOQqdtWT

ssm log, /var/log/amazon/ssm/amazon-ssm-agent.log:

2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] Got job [email protected], starting worker
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] ssm-session-worker - v3.1.90.0
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [[email protected]] channel: [email protected]
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] document: [email protected] worker started
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [[email protected]] master listener started on path: /var
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] channel: [email protected] found
2021-08-10 21:39:15 INFO [ssm-agent-worker] [MessageGatewayService] [EngineProcessor] [BasicExecuter] [[email protected]] inter process communication started a
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] inter process communication started at /var/lib/amazon/ssm/i-0c208bc6d31fa6bf1/channels/bruce
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] worker listener started on path: /var/lib/amazon/ssm/i-0c208bc6d31fa6bf1/channels/bruce.edge@
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] received plugin config message
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] {"DocumentInformation":{"DocumentID":"[email protected]","Co
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] Running plugin Standard_Stream Standard_Stream
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Setting up datachannel for session: bruce.edge@xxx
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Opening websocket connection to: wss://ssmmessages
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Successfully opened websocket connection to: wss:/
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Starting websocket pinger
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Starting websocket listener
2021-08-10 21:39:15 INFO [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Initiating Handshake
2021-08-10 21:39:17 ERROR [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Fetching data key failed: Unable to retrieve data
    status code: 400, request id: 7814ad26-119b-4123-b077-65bb7f24cdfa
2021-08-10 21:39:17 ERROR [ssm-session-worker] [[email protected]] [DataBackend] [pluginName=Standard_Stream] Encountered error while initiating handshake. Fet
    status code: 400, request id: 7814ad26-119b-4123-b077-65bb7f24cdfa

Both the ansible runner user and the instance role being connected to have full access to the KMS key:

# aws kms describe-key --key-id d71201a3-5c82-466d-aa8e-e7f9eef3696e
{
    "KeyMetadata": {
        "AWSAccountId": "xxxxxx",
        "KeyId": "d71201a3-5c82-466d-aa8e-e7f9eef3696e",
        "Arn": "arn:aws:kms:eu-central-1:580867092569:key/d71201a3-5c82......",
        "CreationDate": "2021-08-11T16:45:35.805000+00:00",
        "Enabled": true,
        "Description": "Manually created key for SSM encryption",
        "KeyUsage": "ENCRYPT_DECRYPT",
        "KeyState": "Enabled",
        "Origin": "AWS_KMS",
        "KeyManager": "CUSTOMER",
        "CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
        "EncryptionAlgorithms": [
            "SYMMETRIC_DEFAULT"
        ]
    }
}

Code of Conduct

  • [X] I agree to follow the Ansible Code of Conduct

bedge avatar Aug 11 '21 17:08 bedge

Files identified in the description: None

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Aug 11 '21 17:08 ansibullbot

This is the config that works:

ss-2021-32-11_09 32 23

This does not:

Screen Shot 2021-08-11 at 10 11 05

bedge avatar Aug 11 '21 17:08 bedge

Still fails after updating to latest boto components:

[I] ➜ pip show boto boto3 botocore
Name: boto
Version: 2.49.0
Summary: Amazon Web Services Library
Home-page: https://github.com/boto/boto/
Author: Mitch Garnaat
Author-email: [email protected]
License: MIT
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires:
Required-by:
---
Name: boto3
Version: 1.18.18
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: botocore, s3transfer, jmespath
Required-by: navify-aws-sso-login, aws-ssm-copy
---
Name: botocore
Version: 1.21.18
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email:
License: Apache License 2.0
Location: /Users/edgeb1/.pyenv/versions/3.9.0/lib/python3.9/site-packages
Requires: urllib3, python-dateutil, jmespath
Required-by: s3transfer, boto3

bedge avatar Aug 11 '21 17:08 bedge

Just confirmed while replicating that the default shell needed to NOT be dash as well

Also I don't understand why the s3 bucket config needs to exist. If the instance doesn't have R/W permissions to the defined bucket it also fails, even though nothing has been written to the bucket:

---
- name: Test command
  gather_facts: false
  hosts: all
  vars:
    ansible_connection: aws_ssm
    ansible_aws_ssm_region: eu-central-1
    ansible_aws_ssm_bucket_name: nghc-sbox2-s3    <-------- Why is this needed ?
    ansible_python_interpreter: /opt/venv/root/bin/python


  tasks:
    - name: test
      command:
        cmd: hostname

bedge avatar Aug 11 '21 23:08 bedge

ansible_aws_ssm_bucket_name: nghc-sbox2-s3    <-------- Why is this needed ?

I guess because ansible transfer it's plays to the bucket from where the aws ssm agent can download it.

markuman avatar Aug 12 '21 04:08 markuman

Found this doc that could explain the KMS issue

https://aws.amazon.com/premiumsupport/knowledge-center/ssm-session-manager-failures/

If I get time I'll try this setup.

Still trying to sort out exactly what s3 permissions are needed.

bedge avatar Aug 13 '21 20:08 bedge

Got the same/simular Issue but my setup is a bit different:

I run the Ansible Playbook with credentials for a "login-account" and then Ansible itself assigns a role in the desired AWS target account by executing a assume role task on localhost and storing access, secret access key and session token at runtime in the reserved variables of the plugin (access_key_id, ...).

This works fine without KMS encrypted Session Manager, but when activated then this error occurs when running Ansible with -vvvvv:

Failed to process action KMSEncryption: Error calling KMS GenerateDataKey API: NotFoundException: Key 'arn:aws:kms:eu-central-1:[ACCOUNT-ID]:key/[KMS-Key-ID]' does not exist

The interesting part is that [ACCOUNT-ID] is the Account ID of the "login-account" while the [KMS-Key-ID] is from the correct target Account, but this combination is obviously not working.

simon97k avatar Aug 15 '23 14:08 simon97k