mitogen icon indicating copy to clipboard operation
mitogen copied to clipboard

ssh_connection retries seems not used

Open jcosmao opened this issue 6 years ago • 6 comments

  • Which version of Ansible are you running? ansible==2.9.1 mitogen==0.2.9

  • Is your version of Ansible patched in any way? no

  • Are you running with any custom modules, or module_utils loaded? no

  • Have you tried the latest master version from Git? no

  • Do you have some idea of what the underlying problem may be? https://mitogen.networkgenomics.com/ansible_detailed.html#common-problems has instructions to help figure out the likely cause and how to gather relevant logs.

  • Mention your host and target OS and versions ubuntu 16.04

  • Mention your host and target Python versions python 3.5

It seems mitogen does not retry ssh connection if a host is unreachable. Trying to reproduce issue with an unreachable host and ssh retries setup to 5:

ansible.cfg

[ssh_connection]
retries = 5

Play

- name: test                        
   hosts: host_test
   # strategy: mitogen_linear        
   strategy: linear                  
   gather_facts: False               
   tasks:                            
     - file:                         
         path="/tmp/test"            
         state=touch  

Logs

with strategy: linear

PLAY [test] *******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
META: ran handlers    
    
TASK [file] *******************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /tmp/test.yml:40    
Thursday 28 November 2019  18:20:31 +0100 (0:00:00.041)       0:00:00.041 *****     
Using module file /usr/lib/python3.5/site-packages/ansible/modules/files/file.py    
Pipelining is enabled.    
<host_test> ESTABLISH SSH CONNECTION FOR USER: root    
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 1, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 0 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 2, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 1 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 3, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 3 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 4, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 7 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<host_test> ssh_retry: attempt: 5, ssh return code is 255. cmd ([b'ssh', b'-C', b'-o', b'ControlMaster=auto', b'-o', b'ControlPersist=60s', b'-o', b'StrictHostKeyChecking=no', b'-o', b'KbdInteractiveAuthentication=no', b'-o', b'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', b'-o', b'PasswordAuthentication=no', b'-o', b'User="root"', b'-o', b'ConnectTimeout=10', b'-o', b'ControlPath=/root/.ansible/cp/4b4c99a891', b'host_test', b"/bin/sh -c '/usr/bin/python3 && sleep 0'"]...), pausing for 15 seconds
<host_test> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/4b4c99a891 host_test '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
fatal: [host_test]: UNREACHABLE! => changed=false     
  msg: |-    
    Data could not be sent to remote host "host_test". Make sure this host can be reached over ssh: ssh: connect to host host_test port 22: Connection refused    
  unreachable: true    

with strategy: mitogen_linear

PLAYBOOK: test.yml **************************************************************************************************************************************************************************************************************************    
1 plays in /tmp/test.yml    
    
PLAY [test] *********************************************************************************************************************************************************************************************************************************    
[top  4658] 18:18:40.276317 D ansible_mitogen.affinity: CPU mask for Ansible top-level process: 0x000002    
[top  4658] 18:18:40.277332 D ansible_mitogen.process: inherited open file limits: soft=1024 hard=1048576    
[top  4658] 18:18:40.277877 D ansible_mitogen.process: raised soft open file limit from 1024 to 1048576    
[mux  4666] 18:18:40.284712 D ansible_mitogen.affinity: CPU mask for MuxProcess 0: 0x000001    
[mux  4666] 18:18:40.301380 D mitogen.service: Pool(62b0, size=32, th='MainThread'): initialized    
[mux  4666] 18:18:40.303930 D ansible_mitogen.process: Service pool configured: size=32    
META: ran handlers    
    
TASK [file] *********************************************************************************************************************************************************************************************************************************    
task path: /tmp/test.yml:40    
Thursday 28 November 2019  18:18:40 +0100 (0:00:00.098)       0:00:00.098 *****    
[task 4700] 18:18:40.332803 D ansible_mitogen.affinity: CPU mask for WorkerProcess: 0x000004    
[task 4700] 18:18:40.359046 D ansible_mitogen.process: will use multiplexer 0 (/tmp/mitogen_unix_peubdg0l.sock) to connect to "host_test"    
[task 4700] 18:18:40.359485 D mitogen.unix: client: connecting to /tmp/mitogen_unix_peubdg0l.sock    
[task 4700] 18:18:40.360498 D mitogen.unix: client: local ID is 1, remote is 0    
[mux  4666] 18:18:40.360818 D mitogen.unix: listener: accepted connection from PID 4700: unix_client.4700    
[mux  4666] 18:18:40.366643 D mitogen.parent: creating connection to context 2 using mitogen.ssh    
[mux  4666] 18:18:40.519155 D mitogen.parent: command line for Connection(None): ssh -o "LogLevel ERROR" -l root -o "Compression yes" -o "ServerAliveInterval 30" -o "ServerAliveCountMax 10" -o "BatchMode yes" -o "StrictHostKeyChecking no" -o "UserKnownHostsFile /dev/null" -o "GlobalKnownHostsFile /dev/null" -C -o ControlMaster=auto -o ControlPersist=60s host_test /usr/bin/python3 -c "'import codecs,os,sys;_=codecs.decode;exec(_(_(\"eNqFkcFLwzAUxs/rX9FbEha6dJMOCgFlB/EgQhF30CHtkm7BLglp2jr/et86ce08eHs/3vfe9+Ulo2tu6sgqKzEJHO0GpMoQoDTuA5M0mEAtGjvHjMaMkQtndEgOuvGZt5WpJc6G4IawHkIHAIb1Eeyr3IPrIeQ8RCJ3ndIozLXom/JTbhufF5Xs27OmdrNC6Zk9+r3RCHJOrmRT3g+20tXK6Nd0seltpW6VA0Z32f0LQxs+HjtrACs8btAxThE+KG92UqfOGH+bi4PSi2hXmSKvIiHb0/saEZl2H2np05skSQgiAWzvnPISxxQ9Pjw/McbeNIJUWyPg+CRY8Xd8Or8wVmo4OnIFIpGTucDxks2XhKIvZWFTaflFt6aoK9DpR0r7Y7Dq6/OVr9Tdf+q/KeNxyt//m5Nvc4q5XA==\".encode(),\"base64\"),\"zip\"))'"
[mux  4666] 18:18:40.531862 D mitogen.parent: child for Connection(None) started: pid:4703 stdin:77 stdout:77 stderr:79    
[mux  4666] 18:18:40.618116 D mitogen.parent: ssh.host_test: (unrecognized): ssh: connect to host host_test port 22: Connection refused    
[mux  4666] 18:18:40.619316 D mitogen: <Side of ssh.host_test fd 79>: empty read, disconnecting    
[mux  4666] 18:18:40.621030 D mitogen: SetupProtocol(ssh.host_test): disconnecting    
[mux  4666] 18:18:40.622383 D mitogen: <Side of ssh.host_test fd 77>: empty read, disconnecting    
[mux  4666] 18:18:40.623188 D mitogen.parent: failing connection ssh.host_test due to EofError('EOF on stream; last 100 lines received:\nssh: connect to host host_test port 22: Connection refused\r',)    
[mux  4666] 18:18:40.623931 D mitogen.parent: PopenProcess ssh.host_test pid 4703: exited with return code 255    
[mux  4666] 18:18:40.624371 D mitogen: BootstrapProtocol(ssh.host_test): disconnecting    
[mux  4666] 18:18:40.625550 D mitogen: BootstrapProtocol(ssh.host_test): disconnecting    
[task 4700] 18:18:40.627460 D mitogen: MitogenProtocol(unix_listener.4666): disconnecting    
[mux  4666] 18:18:40.628798 D mitogen: <Side of unix_client.4700 fd 76>: empty read, disconnecting
[task 4700] 18:18:40.628844 D mitogen: Waker(fd=11/12): disconnecting
[mux  4666] 18:18:40.629390 D mitogen: MitogenProtocol(unix_client.4700): disconnecting
[task 4700] 18:18:40.630188 D mitogen: Router(Broker(9898)): stats: 0 module requests in 0 ms, 0 sent (0 ms minify time), 0 negative responses. Sent 0.0 kb total, 0.0 kb avg.
fatal: [host_test]: UNREACHABLE! => changed=false 
  msg: |-
    EOF on stream; last 100 lines received:
    ssh: connect to host host_test port 22: Connection refused
  unreachable: true

jcosmao avatar Nov 28 '19 19:11 jcosmao

I just ran into this, too. I'd really like to see support for SSH retries. I often run playbooks with tons of hosts, and often enough the first SSH connection to some of them fails (for whatever reason). Using Ansible's default SSH connection plugin with retries solves this as the second connection attempt works for those few hosts, but with the mitogen plugin no retries are made.

I'd be very grateful if support for ANSIBLE_SSH_RETRIES / [ssh_connection] retries=… was implemented.

mbunkus avatar Dec 19 '19 12:12 mbunkus

Have there been any work arounds or updates on this issue? In our use case we have connection issues based on ping and would like to retry some of those hosts as well. Prior to using mitogen, we weren't experiencing this issue.

lancefriday945 avatar Oct 26 '20 15:10 lancefriday945

I'm seeing the same thing with none of this configuration not being used:

[ssh_connection]
ansible_ssh_timeout = 60s
ssh_args=-o PasswordAuthentication=no -o ConnectTimeout=60s -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/tmp/ansible-ssh-%h-%p-%r

Nevermind, that configuration does get used, but I get this error:

[mux  787] 03:25:53.163826 D mitogen.parent: failing connection ssh.importer.staging.rsrchx.local
due to TimeoutError('Failed to setup connection after 10.00 seconds')

How do you set the Connection.options.connect_timeout value that's used in Connection._on_timer_expired() as shown in the above error message?

nhooey avatar Nov 05 '20 03:11 nhooey

Up

zostrich avatar Nov 11 '25 10:11 zostrich

@zostrich

  1. What version fo Mitogen are you using?
  2. What version of Ansible?
  3. What version(s) of Python on the controller & target(s)?

moreati avatar Nov 17 '25 08:11 moreati

@zostrich

1. What version fo Mitogen are you using?

2. What version of Ansible?

3. What version(s) of Python on the controller & target(s)?

Mitogen: 0.3.31 Ansible: 2.15.9 Python: 3.9.20

When initially attempting to establish an SSH connection, for example on a slow network, the following error may occur:

msg: |-
    EOF on stream; last 100 lines received:
    ssh: ssh_exchange_identification: read: Connection reset by peer
    ssh: ssh_exchange_identification: Connection closed by remote host
  unreachable: true

Retrying to establish an SSH connection seems like a much-needed feature.

zostrich avatar Dec 01 '25 15:12 zostrich