ansible
ansible copied to clipboard
delegate_to with unreachable hosts breaks playbook
ISSUE TYPE
- Bug Report
COMPONENT NAME
delegate_to
ANSIBLE VERSION
ansible 2.4.2.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.12 (default, Nov 20 2017, 18:23:56) [GCC 5.4.0 20160609]
OS / ENVIRONMENT
Ubuntu 16.04.3 LTS
SUMMARY
Using delegate_to
(a perfect example can be found in the documentation http://docs.ansible.com/ansible/latest/playbooks_delegation.html#delegated-facts); When a host passed along to delegate_to
is unreachable for whatever reason. The host that is executing the task will be also be marked as unreachable. After a large error is printed out (output from all hosts in the delegate_to
?) and the playbook ends with something like:
PLAY RECAP *************************************************************************************************************************************************
foo.example.com : ok=0 changed=0 unreachable=1 failed=0
STEPS TO REPRODUCE
playbook.yml:
---
- hosts: app_servers
tasks:
- name: gather facts from db servers
setup:
delegate_to: "{{ item }}"
delegate_facts: True
with_items: "{{groups['dbservers']}}"
- name: test
debug:
msg: "test"
inventory.yml:
---
all:
children:
app_servers:
hosts:
foo.example.com:
dbservers:
hosts:
bar.example.com:
EXPECTED RESULTS
PLAY [app_servers] ***********************************************************************************************************************************
TASK [gather facts from db servers] ******************************************************************************************************************
failed: [foo.example.com] (item=bar.example.com) => {"item": "bar.example.com", "msg": "Failed to connect to the host via ssh: ssh: connect to host bar.example.com port 22: Connection timed out\r\n", "unreachable": true}
TASK [test] ************************************************************************************************************************************************
ok: [foo.example.com] => {
"msg": "test"
}
ACTUAL RESULTS
PLAY [app_servers] ***********************************************************************************************************************************
TASK [gather facts from db servers] ******************************************************************************************************************
failed: [foo.example.com] (item=bar.example.com) => {"item": "bar.example.com", "msg": "Failed to connect to the host via ssh: ssh: connect to host bar.example.com port 22: Connection timed out\r\n", "unreachable": true}
fatal: [foo.example.com]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "...
Updated the issue as it's only delegate_to
that is the cause. omitting delegate_facts
has no affect on the bug
Digging into this further.
When a Task is executed using delegate_to
it's results are stored against the executing host without any context of which host was actually unreachable.
Which happens in TaskExecutor.run(self)
(lib/ansible/executor/task_executor.py
)
When those results are passed back to WorkerProcess.run(self)
(lib/ansible/executor/process/worker.py
), The TaskResult object stores the results from the task and provides is_unreachable(self)
method to check if the host is nolonger reachable. Since there is no context of which host was unreachable apart from the loop's item
property. It will return True
that the executing host is unreachable.
I propose a couple ideas for fixing this bug:
- Include a flag, or metadata in the result to signify which host was unreachable.
- Flag that the result was from a
delegate_to
based task and thus should be ignored/skipped - a task with
delegate_to
should not check if the executing host is unreachable
I'll be happy to provide a PR to resolve this, provided that feedback is given on the proposals above.
Hello?
Can I please get some sort of confirmation from anyone who works on core development so that I may start to plan out a PR to resolve this bug?
I'm not sure it is needed, both block/rescue
and meta
can be used to clear such an error.
Bring this up in one of the IRC core meetings (https://github.com/ansible/community/issues?q=is%3Aopen+is%3Aissue+label%3Acore) to get quorum and a decision on this.
Also this seems to merit a proposal more than an issue: https://github.com/ansible/proposals
In Ansible 2.6.0 it is not entering rescue when the delegated host is not reachable. Need some work around.
So after discussing with core team we think the best approach to solve this is to mark the delegated host as 'unreachable', but still fail and mark the inventory_hostname as 'failed'. This should correct the status and still continue with the play flow as expected (and enable rescue/ignore_errors to work).
Is there meanwhile any way to achieve this?
Edit: figured out setting ignore_unreachable: yes
on the block works, i. e., the task with delegation that fails due to connection timeout is shown as failed, but the play continues and the rescue
gets triggered.
in ansible 2.13.5 problem still present.
ignore_unreachable: yes
works, but print "failed" and "fatal" message. failed_when: false
not help.
any workaround?? it's interesting how such issue still not resolved in 2023!
I have changed my playbook to wait and make sure the host is reachable before executing tasks on the host
- name: Wait for the instance to become reachable and SSH becomes accessible
delegate_to: "{{ item.private_ip_address }}"
loop: "{{ ec2.instances }}"
wait_for_connection:
delay: 5
timeout: 60
If you still need to handle the unreachable error on every step you can manually fail the task:
- name: Check if Python is installed
raw: python --version
register: python_check
delegate_to: "{{ ec2.instances[0].private_ip_address }}"
ignore_unreachable: yes
- name: Fail when unreachable
fail:
when: python_check.unreachable
Makes it very difficult to work with clusters where some nodes may be down.
Edit: in 2.16.5 even rescue: meta: clear_host_errors
does nothing, the delegated task still hard fails. Regardless of if the task and/or block
have ignore_unreachable
.
Edit 2: actually it is just the debugger that is incorrectly invoked if enabled on failure. Without fail debugging enabled it seems to work.
@bluikko it does something, the problem is that it only affects hosts in the current play, which does not include delegated_to ones.