community.general icon indicating copy to clipboard operation
community.general copied to clipboard

etcd3 lookup provider cannot connect to HTTPS etcd3 endpoint

Open eramnes opened this issue 3 years ago • 5 comments

SUMMARY

When using an etcd3 cluster that is configured to use HTTPS, the etcd3 lookup provider appears to be unable to connect to it. Specifying an endpoint of "https://<etcd3_host>:<port>" seems to strip the "https://" from the connection string, and using a host of "https://<etcd3_host>" seems to try to perform a DNS lookup that includes the "https://".

ISSUE TYPE
  • Bug Report
COMPONENT NAME

etcd3 lookup provider

ANSIBLE VERSION
ansible 2.9.16
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/var/go/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.6/site-packages/ansible
  executable location = /bin/ansible
  python version = 3.6.8 (default, Aug 18 2020, 08:33:21) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

CONFIGURATION
No output was returned from ansible-config dump --only-changed.
OS / ENVIRONMENT

Operating System: Red Hat Enterprise Linux 8.3 (Ootpa) CPE OS Name: cpe:/o:redhat:enterprise_linux:8.3:GA Kernel: Linux 4.18.0-240.10.1.el8_3.x86_64 Architecture: x86-64

$ pip3 show etcd3 Name: etcd3 Version: 0.12.0

$ pip3 show grpcio Name: grpcio Version: 1.35.0

I have verified that the machine Ansible runs on is able to successfully connect to the etcd3 cluster:

$ curl https://<etcd3_host>:2379/v3
<a href="/v3/">Moved Permanently</a>.

The etcd3 cluster appears to be healthy and listening on HTTPS:

# etcdctl --user root member list --endpoints=https://<etcd3_host>:2379
Password: 
8e9e05c52164694d, started, 1717328e83e54c57b25e9fcaf348cc9a, https://0.0.0.0:2380, https://0.0.0.0:2379, false
STEPS TO REPRODUCE
#etcd_endpoints: https://<etcd3_host>:2379
etcd_host: https://<etcd3_host>
etcd_port: 2379
etcd_user: ansibleetcd
etcd_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
           123456789...

ad_fqdn: "ad.example.com"
vm_name: "example"

# is_ad_joined: "{{ lookup('community.general.etcd3', '/'+ad_fqdn+'/'+vm_name, endpoints=etcd_endpoints, user=etcd_user, password=etcd_password) }}"
is_ad_joined: "{{ lookup('community.general.etcd3', '/'+ad_fqdn+'/'+vm_name, host=etcd_host, port=etcd_port, user=etcd_user, password=etcd_password) }}"
EXPECTED RESULTS

The etcd3 lookup completes successfully, and returns the value assigned to the requested key

ACTUAL RESULTS

When run with the "endpoint" variable set:

ansible-playbook 2.9.16
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/var/go/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.6/site-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 3.6.8 (default, Aug 18 2020, 08:33:21) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
Using /etc/ansible/ansible.cfg as config file
host_list declined parsing /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/inventory.py as it did not pass its verify_file() method
Parsed /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/inventory.py inventory source with script plugin
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Skipping callback 'actionable', as we already have a stdout callback.
Skipping callback 'counter_enabled', as we already have a stdout callback.
Skipping callback 'debug', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'full_skip', as we already have a stdout callback.
Skipping callback 'json', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'null', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
Skipping callback 'selective', as we already have a stdout callback.
Skipping callback 'skippy', as we already have a stdout callback.
Skipping callback 'stderr', as we already have a stdout callback.
Skipping callback 'unixy', as we already have a stdout callback.
Skipping callback 'yaml', as we already have a stdout callback.

PLAYBOOK: win-template-ad.yml *********************************************
1 plays in win-template-ad.yml
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'

PLAY [Join template to Active Directory] **********************************
META: ran handlers
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'

TASK [win-template-ad : Wait for system to become reachable] **************
task path: /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/roles/win-template-ad/tasks/main.yml:2
etcd3 connection parameters: {'host': '<etcd3_host>', 'port': '2379', 'timeout': 60, 'user': 'ansibleetcd', 'password': '<redacted>'}
fatal: [<vm_name>]: FAILED! => {
    "msg": "The conditional check 'is_ad_joined != \"yes\"' failed. The error was: An unhandled exception occurred while templating '{{ lookup('community.general.etcd3', '/'+ad_fqdn+'/'+vm_name, endpoints=etcd_endpoints, user=etcd_user, password=etcd_password) }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while running the lookup plugin 'community.general.etcd3'. Error was a <class 'ansible.errors.AnsibleLookupError'>, original message: Cannot connect to etcd cluster: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1611413404.759922782\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":5390,\"referenced_errors\":[{\"created\":\"@1611413404.759916232\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc\",\"file_line\":397,\"grpc_status\":14}]}\"\n>\n\nThe error appears to be in '/var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/roles/win-template-ad/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Wait for system to become reachable\n  ^ here\n"

On the etcd3 server, you can see that the lookup tried to connect, but appears to have used HTTP instead of HTTPS:

Jan 23 08:50:04 <etcd3_host> etcd[18814]: rejected connection from "<ansible_host_ip>:50626" (error "tls: first record does not look like a TLS handshake", ServerName "")

When run with the "host" variable set:

ansible-playbook 2.9.16
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/var/go/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.6/site-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 3.6.8 (default, Aug 18 2020, 08:33:21) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]
Using /etc/ansible/ansible.cfg as config file
host_list declined parsing /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/inventory.py as it did not pass its verify_file() method
Parsed /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/inventory.py inventory source with script plugin
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Skipping callback 'actionable', as we already have a stdout callback.
Skipping callback 'counter_enabled', as we already have a stdout callback.
Skipping callback 'debug', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'dense', as we already have a stdout callback.
Skipping callback 'full_skip', as we already have a stdout callback.
Skipping callback 'json', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'null', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
Skipping callback 'selective', as we already have a stdout callback.
Skipping callback 'skippy', as we already have a stdout callback.
Skipping callback 'stderr', as we already have a stdout callback.
Skipping callback 'unixy', as we already have a stdout callback.
Skipping callback 'yaml', as we already have a stdout callback.

PLAYBOOK: win-template-ad.yml *********************************************
1 plays in win-template-ad.yml
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'

PLAY [Join template to Active Directory] **********************************
META: ran handlers
Read vars_file '../group_vars/template-creds.yml'
Read vars_file '../group_vars/etcd-creds.yml'

TASK [win-template-ad : Wait for system to become reachable] **************
task path: /var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/roles/win-template-ad/tasks/main.yml:2
etcd3 connection parameters: {'host': 'https://<etcd3_host>', 'port': 2379, 'timeout': 60, 'user': 'ansibleetcd', 'password': '<redacted>'}
fatal: [vm_name]: FAILED! => {
    "msg": "The conditional check 'is_ad_joined != \"yes\"' failed. The error was: An unhandled exception occurred while templating '{{ lookup('community.general.etcd3', '/'+ad_fqdn+'/'+vm_name, host=etcd_host, port=etcd_port, user=etcd_user, password=etcd_password) }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while running the lookup plugin 'community.general.etcd3'. Error was a <class 'ansible.errors.AnsibleLookupError'>, original message: Cannot connect to etcd cluster: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for service: https://<etcd3_host>:2379\"\n\tdebug_error_string = \"{\"created\":\"@1611413677.532513096\",\"description\":\"Resolver transient failure\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":2140,\"referenced_errors\":[{\"created\":\"@1611413677.532511012\",\"description\":\"DNS resolution failed for service: https://<etcd3_host>:2379\",\"file\":\"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc\",\"file_line\":370,\"grpc_status\":14,\"referenced_errors\":[{\"created\":\"@1611413677.532485854\",\"description\":\"C-ares status is not ARES_SUCCESS qtype=A name=https://<etcd3_host>:2379 is_balancer=0: Domain name not found\",\"file\":\"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc\",\"file_line\":728}]}]}\"\n>\n\nThe error appears to be in '/var/lib/go-agent/pipelines/win-template-bootstrap/playbooks/roles/win-template-ad/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Wait for system to become reachable\n  ^ here\n"

It appears to me from the output that it's attempting the DNS lookup for the host, but including the "https://" instead of just looking at the actual FQDN.

I had to try to sanitize some of the playbook names. I think I got them all, but if anything looks inconsistent in the workflow it's probably my fault.

Thanks!

eramnes avatar Jan 23 '21 16:01 eramnes

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jan 23 '21 16:01 ansibullbot

!component =plugins/lookup/etcd3.py

felixfontein avatar Jan 23 '21 17:01 felixfontein

Files identified in the description:

If these files are inaccurate, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot avatar Jan 23 '21 17:01 ansibullbot

This is the expected behavior of the etcd3 library this plugin uses. (See here) In order for the client to use a HTTPS connection ca_cert must be provided with cert_cert and cert_key optionally included. Maybe just a doc update is appropriate for this issue.

Ajpantuso avatar Apr 22 '21 15:04 Ajpantuso

cc @eric-belhomme click here for bot help

ansibullbot avatar Sep 28 '21 20:09 ansibullbot