ssh_connection retries seems not used
-
Which version of Ansible are you running? ansible==2.9.1 mitogen==0.2.9
-
Is your version of Ansible patched in any way? no
-
Are you running with any custom modules, or
module_utilsloaded? no -
Have you tried the latest master version from Git? no
-
Do you have some idea of what the underlying problem may be? https://mitogen.networkgenomics.com/ansible_detailed.html#common-problems has instructions to help figure out the likely cause and how to gather relevant logs.
-
Mention your host and target OS and versions ubuntu 16.04
-
Mention your host and target Python versions python 3.5
It seems mitogen does not retry ssh connection if a host is unreachable. Trying to reproduce issue with an unreachable host and ssh retries setup to 5:
ansible.cfg
[ssh_connection]
retries = 5
Play
- name: test
hosts: host_test
# strategy: mitogen_linear
strategy: linear
gather_facts: False
tasks:
- file:
path="/tmp/test"
state=touch
Logs
with strategy: linear
PLAY [test] *******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
META: ran handlers
TASK [file] *******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /tmp/test.yml:40
Thursday 28 November 2019 18:20:31 +0100 (0:00:00.041) 0:00:00.041 *****
Using module file /usr/lib/python3.5/site-packages/ansible/modules/files/file.py
Pipelining is enabled.
<host_test> ESTABLISH SSH CONNECTION FOR USER: root
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 1, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 0 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 2, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 1 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 3, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 3 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 4, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 7 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 5, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 15 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
fatal: [host_test]: UNREACHABLE! => changed=false
msg: |-
Data could not be sent to remote host "host_test". Make sure this host can be reached over ssh: ssh: connect to host host_test port 22: Connection refused
unreachable: true
with strategy: mitogen_linear
PLAYBOOK: test.yml **************************************************************************************************************************************************************************************************************************
1 plays in /tmp/test.yml
PLAY [test] *********************************************************************************************************************************************************************************************************************************
[top 4658] 18:18:40.276317 D ansible_mitogen.affinity: CPU mask for Ansible top-level process: 0x000002
[top 4658] 18:18:40.277332 D ansible_mitogen.process: inherited open file limits: soft=1024 hard=1048576
[top 4658] 18:18:40.277877 D ansible_mitogen.process: raised soft open file limit from 1024 to 1048576
[mux 4666] 18:18:40.284712 D ansible_mitogen.affinity: CPU mask for MuxProcess 0: 0x000001
[mux 4666] 18:18:40.301380 D mitogen.service: Pool(62b0, size=32, th='MainThread'): initialized
[mux 4666] 18:18:40.303930 D ansible_mitogen.process: Service pool configured: size=32
META: ran handlers
TASK [file] *********************************************************************************************************************************************************************************************************************************
task path: /tmp/test.yml:40
Thursday 28 November 2019 18:18:40 +0100 (0:00:00.098) 0:00:00.098 *****
[task 4700] 18:18:40.332803 D ansible_mitogen.affinity: CPU mask for WorkerProcess: 0x000004
[task 4700] 18:18:40.359046 D ansible_mitogen.process: will use multiplexer 0 (/tmp/mitogen_unix_peubdg0l.sock) to connect to "host_test"
[task 4700] 18:18:40.359485 D mitogen.unix: client: connecting to /tmp/mitogen_unix_peubdg0l.sock
[task 4700] 18:18:40.360498 D mitogen.unix: client: local ID is 1, remote is 0
[mux 4666] 18:18:40.360818 D mitogen.unix: listener: accepted connection from PID 4700: unix_client.4700
[mux 4666] 18:18:40.366643 D mitogen.parent: creating connection to context 2 using mitogen.ssh
[mux 4666] 18:18:40.519155 D mitogen.parent: command line for Connection(None): ssh -o "LogLevel ERROR" -l root -o "Compression yes" -o "ServerAliveInterval 30" -o "ServerAliveCountMax 10" -o "BatchMode yes" -o "StrictHostKeyChecking no" -o "UserKnownHostsFile /dev/null" -o "GlobalKnownHostsFile /dev/null" -C -o ControlMaster=auto -o ControlPersist=60s host_test /usr/bin/python3 -c "'import codecs,os,sys;_=codecs.decode;exec(_(_(\"eNqFkcFLwzAUxs/rX9FbEha6dJMOCgFlB/EgQhF30CHtkm7BLglp2jr/et86ce08eHs/3vfe9+Ulo2tu6sgqKzEJHO0GpMoQoDTuA5M0mEAtGjvHjMaMkQtndEgOuvGZt5WpJc6G4IawHkIHAIb1Eeyr3IPrIeQ8RCJ3ndIozLXom/JTbhufF5Xs27OmdrNC6Zk9+r3RCHJOrmRT3g+20tXK6Nd0seltpW6VA0Z32f0LQxs+HjtrACs8btAxThE+KG92UqfOGH+bi4PSi2hXmSKvIiHb0/saEZl2H2np05skSQgiAWzvnPISxxQ9Pjw/McbeNIJUWyPg+CRY8Xd8Or8wVmo4OnIFIpGTucDxks2XhKIvZWFTaflFt6aoK9DpR0r7Y7Dq6/OVr9Tdf+q/KeNxyt//m5Nvc4q5XA==\".encode(),\"base64\"),\"zip\"))'"
[mux 4666] 18:18:40.531862 D mitogen.parent: child for Connection(None) started: pid:4703 stdin:77 stdout:77 stderr:79
[mux 4666] 18:18:40.618116 D mitogen.parent: ssh.host_test: (unrecognized): ssh: connect to host host_test port 22: Connection refused
[mux 4666] 18:18:40.619316 D mitogen: <Side of ssh.host_test fd 79>: empty read, disconnecting
[mux 4666] 18:18:40.621030 D mitogen: SetupProtocol(ssh.host_test): disconnecting
[mux 4666] 18:18:40.622383 D mitogen: <Side of ssh.host_test fd 77>: empty read, disconnecting
[mux 4666] 18:18:40.623188 D mitogen.parent: failing connection ssh.host_test due to EofError('EOF on stream; last 100 lines received:\nssh: connect to host host_test port 22: Connection refused\r',)
[mux 4666] 18:18:40.623931 D mitogen.parent: PopenProcess ssh.host_test pid 4703: exited with return code 255
[mux 4666] 18:18:40.624371 D mitogen: BootstrapProtocol(ssh.host_test): disconnecting
[mux 4666] 18:18:40.625550 D mitogen: BootstrapProtocol(ssh.host_test): disconnecting
[task 4700] 18:18:40.627460 D mitogen: MitogenProtocol(unix_listener.4666): disconnecting
[mux 4666] 18:18:40.628798 D mitogen: <Side of unix_client.4700 fd 76>: empty read, disconnecting
[task 4700] 18:18:40.628844 D mitogen: Waker(fd=11/12): disconnecting
[mux 4666] 18:18:40.629390 D mitogen: MitogenProtocol(unix_client.4700): disconnecting
[task 4700] 18:18:40.630188 D mitogen: Router(Broker(9898)): stats: 0 module requests in 0 ms, 0 sent (0 ms minify time), 0 negative responses. Sent 0.0 kb total, 0.0 kb avg.
fatal: [host_test]: UNREACHABLE! => changed=false
msg: |-
EOF on stream; last 100 lines received:
ssh: connect to host host_test port 22: Connection refused
unreachable: true
I just ran into this, too. I'd really like to see support for SSH retries. I often run playbooks with tons of hosts, and often enough the first SSH connection to some of them fails (for whatever reason). Using Ansible's default SSH connection plugin with retries solves this as the second connection attempt works for those few hosts, but with the mitogen plugin no retries are made.
I'd be very grateful if support for ANSIBLE_SSH_RETRIES / [ssh_connection] retries=… was implemented.
Have there been any work arounds or updates on this issue? In our use case we have connection issues based on ping and would like to retry some of those hosts as well. Prior to using mitogen, we weren't experiencing this issue.
I'm seeing the same thing with none of this configuration not being used:
[ssh_connection]
ansible_ssh_timeout = 60s
ssh_args=-o PasswordAuthentication=no -o ConnectTimeout=60s -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/tmp/ansible-ssh-%h-%p-%r
Nevermind, that configuration does get used, but I get this error:
[mux 787] 03:25:53.163826 D mitogen.parent: failing connection ssh.importer.staging.rsrchx.local
due to TimeoutError('Failed to setup connection after 10.00 seconds')
How do you set the Connection.options.connect_timeout value that's used in Connection._on_timer_expired() as shown in the above error message?
Up
@zostrich
- What version fo Mitogen are you using?
- What version of Ansible?
- What version(s) of Python on the controller & target(s)?
1. What version fo Mitogen are you using? 2. What version of Ansible? 3. What version(s) of Python on the controller & target(s)?
Mitogen: 0.3.31 Ansible: 2.15.9 Python: 3.9.20
When initially attempting to establish an SSH connection, for example on a slow network, the following error may occur:
msg: |-
EOF on stream; last 100 lines received:
ssh: ssh_exchange_identification: read: Connection reset by peer
ssh: ssh_exchange_identification: Connection closed by remote host
unreachable: true
Retrying to establish an SSH connection seems like a much-needed feature.