ansible-freeipa icon indicating copy to clipboard operation
ansible-freeipa copied to clipboard

OTP Client enrollment: Permit task to supply IPA domain / realm, rather than solely relying on fragile auto-discovery

Open ewenmcneill opened this issue 3 years ago • 6 comments

There appears to be an undocumented requirement for the Ansible controller to itself be an IPA/IdM client (or maybe IPA/IdM server?) if the IPA Client OTP (One Time Password) enrollment is used. This results from the action_plugins/ipaclient_get_otp.py:

https://github.com/freeipa/ansible-freeipa/blob/d6eaf9122554314d5c7e5d7e8fc664eed30745f8/roles/ipaclient/action_plugins/ipaclient_get_otp.py#L166-L175

which runs on the Ansible controller (verified by having it create a flag file), before the relevant ipaclient_get_otp.py module gets run on the delegate_to host (in our case the IdM server). If the Ansible controller is not an IPA/IdM client, then the message "The host is not an IPA server" results (which also seems a confusing message in this context, as the "host" in that message is neither the one being set up, nor the IPA/IdM server delegate_to at that step, so it took me a while to realise it was the Ansible controller not being an IPA/IdM client was the issue).

This same action_plugins/ipaclient_get_otp.py results in the requirement that the Ansible controller have kinit (which is documented), but the requirement for the Ansible controller to be an IdM client / server is not documented there.

AFAICT the action_plugins/ipaclient_get_otp.py is running the ipaclient_get_facts module on the Ansible controller itself to "auto discover* the domain and realm that the client should be enrolled into, despite the fact that a bunch of other related parameters are passed into to theipaclient_get_otp.py module (and retrieved by the action_plugins/ipaclient_get_otp.py a little earlier on).

FWIW, I'm also surprised by this design of using an action_plugins to run IPA/IdM related code on the Ansible controller in the first place, rather than just doing all the work on the delegate_to IPA/IdM server. I'm 100% happy to have Ansible pass the admin credentials to the IPA/IdM server (because the server is well locked down, and already got set up with those credentials); I'm just trying to use the OTP process to avoid passing the admin credentials to every end user client. Simply doing all the work on the IPA/IdM server (ie, the delegate_to system) would seem a lot simpler design and remove the requirement for the Ansible controller to have kinit on it too (the IPA/IdM server will obviously have kinit on it :-) ).

FTR, this ipaclient_get_otp.py module, and it's auto-associated (by Ansible) action_plugins/ipaclient_get_otp.py running on the Ansible controller, is called from this bit of the roles/ipaclient install task:

https://github.com/freeipa/ansible-freeipa/blob/d6eaf9122554314d5c7e5d7e8fc664eed30745f8/roles/ipaclient/tasks/install.yml#L101-L130

which gives it a bunch of parameters already. And domain and realm are both potentially available earlier in that task:

https://github.com/freeipa/ansible-freeipa/blob/d6eaf9122554314d5c7e5d7e8fc664eed30745f8/roles/ipaclient/tasks/install.yml#L33 https://github.com/freeipa/ansible-freeipa/blob/d6eaf9122554314d5c7e5d7e8fc664eed30745f8/roles/ipaclient/tasks/install.yml#L35

so could be passed in (if available), which would seem to entirely avoid the need for the action_plugins/ipaclient_get_otp.py to just assume that it can auto-discover those values from the Ansible controller itself. (Or maybe have the auto-discovery as "just a fallback if not passed in", with documentation that if they're omitted then the Ansible controller must itself already be enrolled in the same IPA/IdM domain/realm as the client.)

(Tested with ansible_freeipa 0.4.2 from Ansible Galaxy, but seems to still be true in HEAD in the git repo.)

ewenmcneill avatar Nov 23 '21 01:11 ewenmcneill

By (a) ensuring ipaclient_domain and ipaclient_realm are available (ipaclient_domain already was, added ipaclient_realm), and (b) using the following local patches, I appear to have been able to get ipaclient_get_otp.py to work on an Ansible controller that is not an IPA/IdM client. And hopefully not broken anything else in the process. Tested/diffs again ansible_freeipa 0.4.2 from Ansible Galaxy, but in theory something pretty much like this should also work on HEAD.

Of note, the modules/ipaclient_get_otp.py (and roles/ipaclient/library/ipaclient_get_otp.py but that seems unused in the Ansible Galaxy case due to the explicit path used to call ipaclient_get_otp.py) needs to accept two optional parameters (domain and realm) which are only there for the benefit of action_plugins/ipaclient_get_otp.py (ie, to get from the task magically through to the action plugin); I've just marked them as optional, but possibly they need documentation implying they're "needed unless the Ansible Controller is an IPA/IdM client itself".

plugins/action/ipaclient_get_otp.py and roles/ipaclient/action_plugins/ipaclient_get_otp.py differences (files are identical, so only one diff; the diff mostly looks long because of the Python-induced change in indent to avoid doing the discovery of domain and realm if they're already passed in):

--- plugins/action/ipaclient_get_otp.py-bkup-2021-11-23	2021-11-16 17:46:16.673652185 +1300
+++ plugins/action/ipaclient_get_otp.py	2021-11-23 14:14:38.223109329 +1300
@@ -152,6 +152,8 @@
         keytab = self._task.args.get('keytab', None)
         password = self._task.args.get('password', None)
         lifetime = self._task.args.get('lifetime', '1h')
+        domain = self._task.args.get('domain', None)
+        realm = self._task.args.get('realm', None)
 
         if (not keytab and not password):
             result['failed'] = True
@@ -163,16 +165,23 @@
             result['msg'] = "principal is required"
             return result
 
-        data = self._execute_module(module_name='ipaclient_get_facts',
-                                    module_args=dict(), task_vars=task_vars)
- 
-        try:
-            domain = data['ansible_facts']['ipa']['domain']
-            realm = data['ansible_facts']['ipa']['realm']
-        except KeyError:
-            result['failed'] = True
-            result['msg'] = "The host is not an IPA server"
-            return result
+        if not domain or not realm:
+            data = self._execute_module(module_name='ipaclient_get_facts',
+                                        module_args=dict(), task_vars=task_vars)
+
+            try:
+                domain = data['ansible_facts']['ipa']['domain']
+                realm = data['ansible_facts']['ipa']['realm']
+            except KeyError:
+                result['failed'] = True
+                result['msg'] = "The host is not an IPA server"
+                return result
 
         items = principal.split('@')
         if len(items) < 2:

modules/ipaclient_get_otp.py and roles/ipaclient/library/ipaclient_get_otp.py changes (identical files, so only one diff):

--- plugins/modules/ipaclient_get_otp.py-bkup-2021-11-23	2021-10-20 12:35:50.344901689 +1300
+++ plugins/modules/ipaclient_get_otp.py	2021-11-23 14:49:12.490150198 +1300
@@ -60,6 +60,12 @@
   state:
     description: The desired host state
     required: yes
+  domain:
+    description: IPA/IdM domain to enroll client into
+    required: no
+  realm:
+    description: IPA/IdM realm to enroll client into
+    required: no
 author:
     - "Florence Blanc-Renaud"
 '''
@@ -281,6 +287,8 @@
             ipaddress=dict(required=False),
             random=dict(default=False, type='bool'),
             state=dict(default='present', choices=['present', 'absent']),
+            domain=dict(required=False),
+            realm=dict(required=False),
         ),
         supports_check_mode=True,
     )

And the roles/ipaclient/tasks/install.yml difference to pass the domain and realm in if available:

--- roles/ipaclient/tasks/install.yml-bkup-2021-11-23	2021-10-20 12:35:50.124879379 +1300
+++ roles/ipaclient/tasks/install.yml	2021-11-23 14:29:33.893343996 +1300
@@ -119,6 +119,8 @@
         keytab: "{{ ipaadmin_keytab | default(omit) }}"
         fqdn: "{{ result_ipaclient_test.hostname }}"
         lifetime: "{{ ipaclient_lifetime | default(omit) }}"
+        domain: "{{ result_ipaclient_test.get('domain') | default(omit) }}"
+        realm: "{{ result_ipaclient_test.get('realm') | default(omit) }}"
         random: True
       register: result_ipaclient_get_otp
       # If the host is already enrolled, this command will exit on error

It'd be quite helpful if something like this could be incorporated upstream / into some later release, and I think preferable to documenting that the Ansible controller has to be an IdM/IPA client in the OTP client enrollment case :-)

Ewen

ewenmcneill avatar Nov 23 '21 02:11 ewenmcneill

ipaclient_get_facts is not supposed to be executed on the controller, but the server. Using OTP is working in our tests and the controller is not an IPA client.

t-woerner avatar Nov 23 '21 16:11 t-woerner

Which ansible-freeipa version are you using and do you have a log?

t-woerner avatar Nov 23 '21 16:11 t-woerner

ipaclient_get_facts is not supposed to be executed on the controller, but the server. Using OTP is working in our tests and the controller is not an IPA client. [...] Which ansible-freeipa version are you using and do you have a log?

With this issue, it's Ansible FreeIPA from Ansible Galaxy, currently pinned to 0.4.2. (The Ansible controller is on Ansible 2.9.6, the Ubuntu 20.04 LTS packaged version; which I know is old -- see below.)

It's interesting to know that OTP is working in your tests without the Ansible controller being an IPA client / server. That suggests maybe it was failing in a different way to what I had guessed from the symptoms I was seeing.

From Ansible's (brief) documentation of ActionPlugins, it looks like the Action Plugin (eg, action_plugins/ipaclient_get_otp.py) is expected to execute on the Ansible Controller, and to use self._execute_module() to run the module on the target (in this case the delegate_to target). A comment to the effect ("discover IPA realm/domain from the delegate_to Ansible server") in the action_plugins/ipaclient_get_otp.py code at the discovery step would help make it easier for those of us less familiar with the Ansible internals to understand what's happening there, when it breaks :-)

Possibly this is another victim of the (old) Ansible bug around detecting Python interpreters in delegate_to situations -- https://github.com/ansible/ansible/issues/63180. This client's setup with this problem has Ansible 2.9.6 (the one shipped with Ubuntu 20.04 LTS; yes I know it's old), which predates the upstream workaround for the delegate_to Python interpreter detection (always rediscover it on new target), and all of the Ansible Controller (Ubuntu 20.04 LTS), the IPA/IdM Server (RHEL 8), and the IPA/IdM client they were trying to install (CentOS 7) have the Python default interpreter (ie, the one Ansible needs to use) in different locations, which means any issues with python interpreter detection are most likely to break things :-(

Unfortunately I don't have good logs of this problem, beause the affected task in the role runs with no_log: true as shipped, which suppresses the logs, and the next item logging only logs the "Not an IPA server" line which was... confusing. I focused on understanding the action_plugins/ipaclient_get_otp.py through code changes there before I remembered I could change no_log: to false temporarily to see more information. And then I implemented the workaround to avoid this discovery, and so the client system I was testing with successfully installed, which means I don't currently have a reproducible test case.

Given your additional information currently my guess is that the self._execute_module(module_name='ipaclient_get_facts', ...) failed to run on the delegate_to host because it couldn't find the Python interpreter (because of the old Ansible bug), and the logs of that failure were suppressed by no_log: true. Which meant the domain and realm were not discovered, and no_log: true resulted in only printing the result message ("The host is not an IPA server") rather than the underlying issue. Possibly that message should be something like "Failed to discover IPA domain / realm -- is the delegate_to host an IPA server?" or something else that highlights there could be other causes?

It'd also be helpful to merge something like the change I made in https://github.com/freeipa/ansible-freeipa/issues/688#issuecomment-976108705, to allow domain and realm to be passed in, as I imagine this isn't the only environment where they're trivially available to the ipaclient task running and that would entirely avoid the magical discovery step if they are provided.

Ewen

ewenmcneill avatar Nov 23 '21 21:11 ewenmcneill

Possibly this is another victim of the (old) Ansible bug around detecting Python interpreters in delegate_to situations -- ansible/ansible#63180.

Thanks to now knowing where to look, and the same client presenting me with another VM to install as an IPA client, I've managed to recreate the problem, by removing the hosts var that was hard coding the Python interpreter for the IPA/IdM server (which is RHEL 8, so doesn't have /usr/bin/python* on it, instead the Ansible Python interpreter is in /usr/libexec :-/ ).

So here's a log snippet (with no_log: false and maybe -v) that proves the python interpreter detection logic was the underlying cause. (I've edited it to change the hostnames for obvious reasons, and basically to make it easier to understand.)

The key bit is:

/bin/sh: /usr/bin/python: No such file or directory

which is expected given RHEL 8, because the actual Python used needs to be /usr/libexec/platform-python or maybe /usr/bin/python3 if it's been installed on the RHEL 8 system (Ansible normally uses /usr/libexec/platform-python when the detection logic is working properly AFAICT; but that detection logic fails in old Ansible in the face of multiple delegate_to to multiple different hosts.)

So this work around in the Ansible hosts file seems sufficient (all the IPA servers are in the group role-idm-servers in this setup):

[role-idm-servers:vars]
ansible_python_interpreter=/usr/libexec/platform-python

(If I use my work around to pass in domain and realm it fails on the second self._module_execute(); if I don't, it fails on the first self._module_execute(). But ultimately the root cause of the failure is the same, and it's just the returned error message "Host is not an IPA server" combined with no_log: true which confuses discovering that cause. Ansible's default log here The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error. is fairly helpful when you've seen this situation before, but obviously not logged with no_log: true and the msg being returned is replaced by the action_plugins/ipaclient_get_otp.py with a more generic "not an IPA server" one.)

Thanks for the hints to look deeper,

Ewen

<ipaserver1.example.com> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="
ansible"' -o ConnectTimeout=10 -o ControlPath=/home/ewen/.ansible/cp/c2f85e917b ipaserver1.example.com '/bin/sh -c '"'"'rm -f -r /opt/ansible/.ansible/tmp/ansible-tmp-1637711433.3384302-59688000881043/ > /dev/null 2>&1 && sleep 0'"'"''
<ipaserver1.example.com> (0, b'', b'OpenSSH_8.2p1 Ubuntu-4ubuntu0.3, OpenSSL 1.1.1f  31 Mar 2020\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no
 files\r\ndebug1: /etc/ssh/ssh_config line 21: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: reques
t forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: ente
ring\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 696077\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n')
fatal: [newipaclient.example.com -> ipaserver1.example.com]: FAILED! => {
    "changed": false,
    "failed_when_result": true,
    "module_stderr": "OpenSSH_8.2p1 Ubuntu-4ubuntu0.3, OpenSSL 1.1.1f  31 Mar 2020\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files\r\ndebug1: /etc/ssh/ssh_config line 21: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 696077\r\ndebug3: mux_client_request_session: session request sent\r\n/bin/sh: /usr/bin/python: No such file or directory\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 127\r\n",
    "module_stdout": "",
    "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
    "rc": 127
}
...ignoring

For reference, this is what it looks like in the no_log: true case, with the Ansible-Galaxy 0.4.2 version of the code, without debug logging:

TASK [ipaclient : Install - Get One-Time Password for client enrollment] *******
fatal: [newipaclient.example.com -> ipaserver1.example.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
...ignoring

TASK [ipaclient : Install - Report error for OTP generation] *******************
fatal: [newipaclient.example.com ]: FAILED! => {
    "msg": "The host is not an IPA server"
}

and with debug logging:

<ipaserver1.example.com> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath=/home/colinb/.ansible/cp/c2f85e917b)
<ipaserver1.example.com> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ansible"' -o ConnectTimeout=10 -o ControlPath=/home/colinb/.ansible/cp/c2f85e917b ipaserver1.example.com '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-muiautslizquxzvthgkupaqkihgwwdlx ; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Escalation succeeded
<ipaserver1.example.com> rc=127, stdout and stderr censored due to no log
<ipaserver1.example.com> Failed to connect to the host via ssh: <error censored due to no log>
fatal: [ipaclient.example.com -> ipaserver1.example.com]: FAILED! => {
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
    "changed": false
}
...ignoring

TASK [ipaclient : Install - Report error for OTP generation] *******************
task path: /usr/local/ansible-galaxy/ansible_collections/freeipa/ansible_freeipa/roles/ipaclient/tasks/install.yml:132
fatal: [ipaclient.example.com]: FAILED! => {
    "msg": "The host is not an IPA server"
}

One basically has to recreate it with no_log: no in order to actually see what went wrong, which is very much not the error mesasge of "The host is not an IPA server".... :-)

ewenmcneill avatar Nov 24 '21 00:11 ewenmcneill

PR https://github.com/freeipa/ansible-freeipa/pull/987 is changing the code for OTP. The action plugin is removed and the OTP is generated on the first entry in the server list returned by ipaclient_test. As the krb5 configuration of the server is used to generate the OTP, there should not be an issue any more.

t-woerner avatar Nov 23 '22 14:11 t-woerner