mitogen icon indicating copy to clipboard operation
mitogen copied to clipboard

Templating broken when constructing value for `ansible_ssh_common_args`

Open dnmvisser opened this issue 3 years ago • 31 comments

Hi, I'm on ansible-core-2.12.2 (thx for all the work in getting that done) and mitogen v0.3.2.

We have some basic jinja inside one of our vars files:

---
# Use the correct jump host
ansible_ssh_common_args: >-
  -o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p admin@{{ hostvars.jumphost.public_ip_address }}'

This causes errors:

TASK [Waiting for connection] *********************************************************************************************************************************************************
task path: /Users/dick.visser/git/deploy_dick/data/acc/site.yml:417
[WARNING]: Unhandled error in Python interpreter discovery for host acc_proxy1: EOF on stream; last 100 lines received: ssh: Could not resolve hostname {{: nodename nor servname
provided, or not known  kex_exchange_identification: Connection closed by remote host

If I hardcode it like this:

---
# Use the correct jump host
ansible_ssh_common_args: >-
  -o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p [email protected]'

then things work....

The jinja inside the inventory works fine with ansible v3.4.0 (ansible-base 2.10.x)

Any thoughts?

dnmvisser avatar Feb 21 '22 21:02 dnmvisser

I have the same issue with ansible core 2.11.8, mitogen 0.3.2

hungpr0 avatar Feb 24 '22 07:02 hungpr0

same issue here, Ansible Core 2.10.17, mitogen 0.3.2 & 0.3.1

Zocker1999NET avatar Feb 28 '22 10:02 Zocker1999NET

Occured here too with Ansible 2.11.9 and mitogen 0.3.2 (well, master, more specifically) due to this fella https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubespray-defaults/defaults/main.yaml#L4

Tristan971 avatar Mar 03 '22 12:03 Tristan971

with a small test inventory, Ansible 2.10 and git bisect, I found commit c61c063b4f9b2b63dcaa86443631a268c9f72870 to be the reason for this bug. Unfortunately this commit is quite big, so I try to manually find the exact reason to maybe find a bugfix / workaround …

Zocker1999NET avatar Mar 07 '22 13:03 Zocker1999NET

with a small test inventory, Ansible 2.10 and git bisect, I found commit c61c063 to be the reason for this bug. Unfortunately this commit is quite big, so I try to manually find the exact reason to maybe find a bugfix / workaround …

It may be ansible_mitogen/transport_config.py

hungpr0 avatar Mar 07 '22 13:03 hungpr0

It may be ansible_mitogen/transport_config.py

You're right, any other change of this commit does not affect the outcome of my tests. But, it may also be that an internal change in Ansible that (also) causes this bug, but I'm not quite sure:

While trying to find the smallest partial revert of commit c61c063b4f9b2b63dcaa86443631a268c9f72870, I detected a difference in the result of my small ping test depending on the version of Ansible used.

Beginning from tag v0.3.2, after applying the diff at the end (which reverts the commit partially), running ansible -m ping host with a small test inventory works for Ansible 2.10 as expected but stops working for Ansible 5.4.0 (core 2.12.3) with the same error message:

host | UNREACHABLE! => {
    "changed": false,
    "msg": "EOF on stream; last 100 lines received:\nssh: Could not resolve hostname {%: Name or service not known\r",
    "unreachable": true
}

So partially reverting this change does work for older Ansible versions (~ 2.10) but not for newer ones (~ 5.4.0 / 2.12.3).

This is the diff from the mention above:

diff --git a/ansible_mitogen/transport_config.py b/ansible_mitogen/transport_config.py
index 4babbde3..344c3d84 100644
--- a/ansible_mitogen/transport_config.py
+++ b/ansible_mitogen/transport_config.py
@@ -467,9 +467,9 @@ class PlayContextSpec(Spec):
         return [
             mitogen.core.to_text(term)
             for s in (
-                C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
-                C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
-                C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {}))
+                getattr(self._play_context, 'ssh_args', ''),
+                getattr(self._play_context, 'ssh_common_args', ''),
+                getattr(self._play_context, 'ssh_extra_args', '')
             )
             for term in ansible.utils.shlex.shlex_split(s or '')
         ]
@@ -696,9 +696,22 @@ class MitogenViaSpec(Spec):
         return [
             mitogen.core.to_text(term)
             for s in (
-                C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
-                C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {})),
-                C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("vars", {}))
+                (
+                    self._host_vars.get('ansible_ssh_args') or
+                    getattr(C, 'ANSIBLE_SSH_ARGS', None) or
+                    os.environ.get('ANSIBLE_SSH_ARGS')
+                    # TODO: ini entry. older versions.
+                ),
+                (
+                    self._host_vars.get('ansible_ssh_common_args') or
+                    os.environ.get('ANSIBLE_SSH_COMMON_ARGS')
+                    # TODO: ini entry.
+                ),
+                (
+                    self._host_vars.get('ansible_ssh_extra_args') or
+                    os.environ.get('ANSIBLE_SSH_EXTRA_ARGS')
+                    # TODO: ini entry.
+                ),
             )
             for term in ansible.utils.shlex.shlex_split(s)
             if s

Zocker1999NET avatar Mar 07 '22 14:03 Zocker1999NET

Encountering the same issue

ansible 2.10.17
mitogen-0.3.2
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=3m -o ForwardAgent=yes
control_path = ~/.ssh/ansible-%%C
ansible_ssh_jumphost: "{{ hostvars[groups['jumphost_servers'][0]]['ansible_host'] }}"
ansible_ssh_common_args: '-o ProxyCommand="ssh -W %h:%p -q {{ ansible_ssh_user }}@{{ ansible_ssh_jumphost }}"'
kex_exchange_identification: Connection closed by remote host

guytet avatar Mar 12 '22 00:03 guytet

@dnmvisser Have you found a working solution for this issue ? mitogen is a very useful part of our toolset, we'd love to hear if there's a way to make this work. Thank you.

guytet avatar Mar 26 '22 12:03 guytet

@dnmvisser Have you foung a working solution for this issue ? mitogen is a very useful part of our toolset, we'd love to hear if there's a way to make this work. Thank you.

Nope, I ended up creating files with hard coded IP addresses etc

dnmvisser avatar Apr 06 '22 08:04 dnmvisser

@dnmvisser Have you foung a working solution for this issue ? mitogen is a very useful part of our toolset, we'd love to hear if there's a way to make this work. Thank you.

Nope, I ended up creating files with hard coded IP addresses etc

We finally got this to work with this combination of versions:

ansible --version
ansible 2.10.17

mitogen-0.3.0rc1

guytet avatar Apr 06 '22 19:04 guytet

I took the time to inspect further and found a difference in the calling of C.config.get_config_value between Ansible and Mitogen.

For getting the configuration of ssh_common_args, Mitogen calls:

https://github.com/mitogen-hq/mitogen/blob/89c0cc94d16218e2647bb8bb32b011231def0fd7/ansible_mitogen/transport_config.py#L478

Ansible plugins (here ssh) use a helper AnsiblePlugin.get_option which does (if GitHub does not render the code, click on the links):

https://github.com/ansible/ansible/blob/b104478f171a4030c0cd96ef4d99db65bf04dceb/lib/ansible/plugins/connection/ssh.py#L743-L744

https://github.com/ansible/ansible/blob/b104478f171a4030c0cd96ef4d99db65bf04dceb/lib/ansible/plugins/init.py#L55-L62

Intercepting these calls to get_config_value reveals, that the calls from the official ssh plugin sets the argument variables to a dict containing all host variables already resolved (a.k.a. not in their template form after Jinja2). However Mitogen's connection plugin sets the argument to a dict containing the probably the task variables unresolved (a.k.a. in their template form before Jinja2).

Meaning in practice: Given these example host vars:

ansible_ssh_common_args: "{{ other_var }}"
other_var: "--my-option"

Then the argument variables of get_config_value looks like

  • {…, "ansible_ssh_common_args": "--my-option", …} if called from Ansible's ssh plugin
  • {…, "ansible_ssh_common_args": "{{ other var }}", …} if called from Mitogen's connection plugin

I do not know Ansible's Python code good enough to fix this, probably by resolving the variables properly before passing them to get_config_value, but maybe this helps someone else.

Zocker1999NET avatar May 10 '22 13:05 Zocker1999NET

Any updates?

maxpain avatar Jun 09 '22 02:06 maxpain

@moreati could you please look into this issue?

maxpain avatar Jun 09 '22 02:06 maxpain

Any workarounds maybe? I use this with kubespray.

maxpain avatar Jun 24 '22 17:06 maxpain

Same issues for while installing kubespray with mitogen 0.3.3.

After playing a little with the python script and the responsible file (thx @Zocker1999NET), I find a way to fix it. However, I didn't took the time yet to check whether the change can have side effects or generate issues, as my guess was hostvars if the view of vars for each host. Hope this is right!

The fix is to replace self._task_vars.get("vars", {}) with self._task_vars.get("hostvars", {}).get(self._inventory_name, {}) in PlayContextSpec, around lines 483 (method ssh_args).

Result looks like:

    def ssh_args(self):
        return [
            mitogen.core.to_text(term)
            for s in (
                C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {})),
                C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {})),
                C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {}))
            )
            for term in ansible.utils.shlex.shlex_split(s or '')
        ]

I won't be able to verify the fix until August, but if someone can play with it, let's share the result! Edit: meaning I was not able to run the plyaybool till the end to be sure it works, but it defintely fixes the blocking task.

momiji avatar Jul 07 '22 16:07 momiji

@momiji I think this can be a valid fix for this issue. I applied the change to both ssh_args methods in mitogen/ansible_mitogen/transport_config.py and ran a relatively huge Ansible repo I maintain in check mode and everything seemed fine. It could connect to all hosts expected even with templates in ansible_ssh_common_args and did not report any new diffs or errors. Can you create a PR with this patch so it might be reviewed?

Zocker1999NET avatar Jul 07 '22 20:07 Zocker1999NET

I tested @momiji's patch (much appreciated!) with a simple test and a more complex real-world playbook today and everything is worked as expected.

We will probably be using this patched mitogen for our playbooks until an official fix comes out, so I'll report back here if we do run into any regressions or issues that may be related.

aidanbh avatar Jul 11 '22 23:07 aidanbh

Hello,

Thanks for the patch @momiji It works for bastion host with ansible_ssh_common_args in template. Unfortunately, after applied the patch in both ssh_args methods in mitogen/ansible_mitogen/transport_config.py, it introduces another issue with ansible.posix.synchronize module (ansible.posix collection 1.2.0). When using use_ssh_args: true for rsync folder, template seems doesn't work for synchronize. https://docs.ansible.com/ansible/latest/collections/ansible/posix/synchronize_module.html

I have playbook tasks:

  tasks:
    - name: Sync scripts
      ansible.posix.synchronize:
        src: ../roles/my_server/files/opt/scripts/
        dest: /opt/scripts/
        recursive: true
        use_ssh_args: true
        archive: false
        rsync_opts:
          - '--chmod=0750'
          - '-o'
          - '-g'
          - '--chown=root:mycustomgroup'

Playbook run error:

{
  "rc": 255,
  "cmd": "sshpass -d18 /usr/bin/rsync --delay-updates -F --compress --recursive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ProxyCommand=\"ssh -W %h:%p {{ bastion_user }}@{{ bastion_hostname }} -i $BASTION_SSH_PRIVATE_KEY -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null\" --rsync-path=sudo rsync --chmod=0750 -o -g --chown=root:mycustomgroup --out-format=<<CHANGED>>%i %n%L /runner/project/roles/my_server/files/opt/scripts/ ansible@myserver:/opt/scripts/",
  "msg": "ssh: Could not resolve hostname {{: Name or service not known\r\nkex_exchange_identification: Connection closed by remote host\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\nrsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]\n",
  "invocation": {
    "module_args": {
      "src": "/runner/project/roles/my_server/files/opt/scripts/",
      "dest": "ansible@myserver:/opt/scripts/",
      "recursive": true,
      "archive": false,
      "rsync_opts": [
        "--chmod=0750",
        "-o",
        "-g",
        "--chown=root:mycustomgroup"
      ],
      "_local_rsync_path": "rsync",
      "_local_rsync_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
      "private_key": null,
      "rsync_path": "sudo rsync",
      "ssh_args": "-o ProxyCommand=\"ssh -W %h:%p {{ bastion_user }}@{{ bastion_hostname }} -i $BASTION_SSH_PRIVATE_KEY -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null\"",
      "delete": false,
      "_substitute_controller": false,
      "checksum": false,
      "compress": true,
      "existing_only": false,
      "dirs": false,
      "copy_links": false,
      "set_remote_user": true,
      "rsync_timeout": 0,
      "ssh_connection_multiplexing": false,
      "partial": false,
      "verify_host": false,
      "mode": "push",
      "dest_port": null,
      "links": null,
      "perms": null,
      "times": null,
      "owner": null,
      "group": null,
      "link_dest": null
    }
  },
  "_ansible_no_log": false,
  "changed": false
}

hungpr0 avatar Jul 13 '22 03:07 hungpr0

Same issues for while installing kubespray with mitogen 0.3.3.

After playing a little with the python script and the responsible file (thx @Zocker1999NET), I find a way to fix it. However, I didn't took the time yet to check whether the change can have side effects or generate issues, as my guess was hostvars if the view of vars for each host. Hope this is right!

The fix is to replace self._task_vars.get("vars", {}) with self._task_vars.get("hostvars", {}).get(self._inventory_name, {}) in PlayContextSpec, around lines 483 (method ssh_args).

Result looks like:

    def ssh_args(self):
        return [
            mitogen.core.to_text(term)
            for s in (
                C.config.get_config_value("ssh_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {})),
                C.config.get_config_value("ssh_common_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {})),
                C.config.get_config_value("ssh_extra_args", plugin_type="connection", plugin_name="ssh", variables=self._task_vars.get("hostvars", {}).get(self._inventory_name, {}))
            )
            for term in ansible.utils.shlex.shlex_split(s or '')
        ]

I won't be able to verify the fix until August, but if someone can play with it, let's share the result! Edit: meaning I was not able to run the plyaybool till the end to be sure it works, but it defintely fixes the blocking task.

I tried this out with ansible 5.10.0 and mitogen-0.3.3 but for me this does not work. I still get the same error (Could not resolve hostname {{:)

dnmvisser avatar Aug 10 '22 09:08 dnmvisser

Hi all, after reading your remarks, I've been able to make some tests on my side:

  • applied patch to both ssh_args instead of only to the first one, as suggested
  • run kubespray 1.18.1 installation (using ansible 5.4.0) without any issues

I don't think it's using the bastion feature, so I can't help on the remaining issues.

momiji avatar Aug 11 '22 11:08 momiji

Hi, also achieved migration to kubespray 1.19.0 (using ansible 5.7.1) with no issues. Next is testing with another playbook (with a higher version of ansible), and if it goes well I'll prepare a PR for this.

momiji avatar Aug 11 '22 15:08 momiji

Hello, PR #956 sent.

momiji avatar Aug 22 '22 11:08 momiji

Hello,

Thanks for the patch @momiji It works for bastion host with ansible_ssh_common_args in template. Unfortunately, after applied the patch in both ssh_args methods in mitogen/ansible_mitogen/transport_config.py, it introduces another issue with ansible.posix.synchronize module (ansible.posix collection 1.2.0). When using use_ssh_args: true for rsync folder, template seems doesn't work for synchronize. https://docs.ansible.com/ansible/latest/collections/ansible/posix/synchronize_module.html

I have playbook tasks:

  tasks:
    - name: Sync scripts
      ansible.posix.synchronize:
        src: ../roles/my_server/files/opt/scripts/
        dest: /opt/scripts/
        recursive: true
        use_ssh_args: true
        archive: false
        rsync_opts:
          - '--chmod=0750'
          - '-o'
          - '-g'
          - '--chown=root:mycustomgroup'

Playbook run error:

{
  "rc": 255,
  "cmd": "sshpass -d18 /usr/bin/rsync --delay-updates -F --compress --recursive --rsh=/usr/bin/ssh -S none -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ProxyCommand=\"ssh -W %h:%p {{ bastion_user }}@{{ bastion_hostname }} -i $BASTION_SSH_PRIVATE_KEY -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null\" --rsync-path=sudo rsync --chmod=0750 -o -g --chown=root:mycustomgroup --out-format=<<CHANGED>>%i %n%L /runner/project/roles/my_server/files/opt/scripts/ ansible@myserver:/opt/scripts/",
  "msg": "ssh: Could not resolve hostname {{: Name or service not known\r\nkex_exchange_identification: Connection closed by remote host\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\nrsync error: unexplained error (code 255) at io.c(226) [sender=3.1.3]\n",
  "invocation": {
    "module_args": {
      "src": "/runner/project/roles/my_server/files/opt/scripts/",
      "dest": "ansible@myserver:/opt/scripts/",
      "recursive": true,
      "archive": false,
      "rsync_opts": [
        "--chmod=0750",
        "-o",
        "-g",
        "--chown=root:mycustomgroup"
      ],
      "_local_rsync_path": "rsync",
      "_local_rsync_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
      "private_key": null,
      "rsync_path": "sudo rsync",
      "ssh_args": "-o ProxyCommand=\"ssh -W %h:%p {{ bastion_user }}@{{ bastion_hostname }} -i $BASTION_SSH_PRIVATE_KEY -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null\"",
      "delete": false,
      "_substitute_controller": false,
      "checksum": false,
      "compress": true,
      "existing_only": false,
      "dirs": false,
      "copy_links": false,
      "set_remote_user": true,
      "rsync_timeout": 0,
      "ssh_connection_multiplexing": false,
      "partial": false,
      "verify_host": false,
      "mode": "push",
      "dest_port": null,
      "links": null,
      "perms": null,
      "times": null,
      "owner": null,
      "group": null,
      "link_dest": null
    }
  },
  "_ansible_no_log": false,
  "changed": false
}

Just to update that

  • with ansible.posix 1.4.0
  • applying patch https://github.com/mitogen-hq/mitogen/pull/956 on latest commit https://github.com/mitogen-hq/mitogen/commit/572636a9d3c5a4ac4e8591c42f29763cb56fe602

No more error ansible.posix.synchronize above

hungpr0 avatar Oct 03 '22 15:10 hungpr0

Any way to get this merged? I also need to apply the patch to get my setup working...

sebastianreloaded avatar Oct 06 '22 18:10 sebastianreloaded

I've tried once again today and still no luck with:

  • ansible v5 (several combination tried using ansible-core-2.12.x)
  • any mitogen
  • jinja code in ansible_common_ssh_args

If there are people out there that do have a working setup using the above, please post which versions you use:

  • ansible --version
  • ansible-galaxy collection list
  • which mitogen version/commit, and what patch on top of that

If anyone is interested, we use a shell wrapper for ansible-playbook, which allowed me to work around this issue slightly more elegant than just hardcoding the jumphost IP. We fetch the IP first using aws cli and then use that in the ssh-common-args command line argument:

#
export JUMP_IP=$(aws ec2 describe-instances \
    --region ${AWS_DEFAULT_REGION} \
    --filters "Name=tag:Name,Values=jumphost" \
    --query 'Reservations[0].Instances[]' | \
    jq -r 'sort_by(.LaunchTime)|reverse|.[0].PublicIpAddress' )

# 
ansible-playbook \
  --ssh-common-args="-o ProxyCommand='ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -W %h:%p admin@${JUMP_IP}'" \
  playbook.yml

dnmvisser avatar Feb 27 '23 17:02 dnmvisser

I see something very similar with templates used in ansible_host, ansible-core 2.4.14, mitogen master branch

- hosts: rproxy0
  vars:
    subnet: "10.10.10.0"
    ansible_host: "{{ subnet | ansible.utils.ipmath(100) }}"
  tasks:
    - debug: var=ansible_host
PLAY [rproxy0] ********************************************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************************
Tuesday 11 April 2023  13:44:10 +0100 (0:00:00.150)       0:00:00.150 *********
fatal: [rproxy0]: UNREACHABLE! => {"changed": false, "msg": "EOF on stream; last 100 lines received:\nssh: Could not resolve hostname {{ subnet | ansible.utils.ipmath(100) }}: nodename nor servname provided, or not known\r", "unreachable": true}

jrosser avatar Apr 11 '23 12:04 jrosser

I experience the a similar issue with a Jinja2 expression in ansible_ssh_user

mkobel avatar Jun 20 '23 08:06 mkobel

related to or duplicate of #599

moreati avatar Jul 26 '23 13:07 moreati

With the time to upgrade our ansible codebase, it seems we're still blocked by anything above mitogen-0.3.0rc1. When trying:

ansible==5.7.1
ansible-core==2.12.5

and

mitogen-0.3.2 or
mitogen-0.3.3 or
mitogen-0.3.4

Result:

EOF on stream; last 100 lines received:
kex_exchange_identification: Connection closed by remote host

I tried applying the patch suggested above, to ansible_mitogen/transport_config.py, but, no go.

If anyone sub'ed here has found a way forward, please do share.

guytet avatar Sep 01 '23 22:09 guytet