community.docker
community.docker copied to clipboard
Commands on Docker host fail with "returned empty string"
SUMMARY
I'm unable to run commands on a docker host. This used to work.
ISSUE TYPE
- Bug Report
COMPONENT NAME
community.docker
ANSIBLE VERSION
ansible [core 2.15.0]
config file = /workspaces/project/ansible.cfg
configured module search path = ['/home/vscode/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/vscode/.local/lib/python3.10/site-packages/ansible
ansible collection location = /home/vscode/.ansible/collections:/usr/share/ansible/collections
executable location = /home/vscode/.local/bin/ansible
python version = 3.10.10 (main, Mar 23 2023, 17:28:09) [GCC 10.2.1 20210110] (/usr/local/bin/python)
jinja version = 3.1.2
libyaml = True
COLLECTION VERSION
$ ansible-galaxy collection list community.docker
# /home/vscode/.ansible/collections/ansible_collections
Collection Version
---------------- -------
community.docker 3.4.6
# /home/vscode/.local/lib/python3.10/site-packages/ansible_collections
Collection Version
---------------- -------
community.docker 3.4.6
# /usr/local/lib/python3.10/site-packages/ansible_collections
Collection Version
---------------- -------
community.docker 3.4.6
CONFIGURATION
CONFIG_FILE() = /workspaces/project/ansible.cfg
OS / ENVIRONMENT
Running Ansible in vscode dev container (mcr.microsoft.com/vscode/devcontainers/python:0-3.10). Docker version 24.0.1, build 6802122.
STEPS TO REPRODUCE
I have a role that includes the following tasks:
---
- name: Create temporary container
community.docker.docker_container:
name: tmp_data
image: python:3.10-bullseye
state: started
auto_remove: yes
command: /bin/bash
interactive: yes
tty: yes
volumes_from:
- "{{ container_name }}"
- name: Add container to inventory
add_host:
name: "{{ inventory_hostname }}+tmp_data"
ansible_connection: community.docker.docker
ansible_host: tmp_data
ansible_user: root
ansible_become: yes
ansible_remote_tmp: /tmp/.ansible
ansible_docker_extra_args: "-H=tcp://{{ ansible_host }}:2375"
ansible_python_interpreter: python
- name: ping
delegate_to: "{{ inventory_hostname }}+tmp_data"
ansible.builtin.ping:
EXPECTED RESULTS
I expect this to work.
ACTUAL RESULTS
Running with -vvv, everything works up to the last task, which fails:
<tmp_data> ESTABLISH DOCKER CONNECTION FOR USER: root
<tmp_data> EXEC ['/usr/bin/docker', '-H=tcp://192.168.1.168:2375', b'exec', b'-u', 'root', b'-i', 'tmp_data', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo /tmp/.ansible `"&& mkdir "` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `" && echo ansible-tmp-1686177373.2244887-17139-101563472711239="` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `" ) && sleep 0\'']
fatal: [windshear-1 -> windshear-1+tmp_data]: FAILED! => {
"msg": "failed to resolve remote temporary directory from ansible-tmp-1686177373.2244887-17139-101563472711239: `( umask 77 && mkdir -p \"` echo /tmp/.ansible `\"&& mkdir \"` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `\" && echo ansible-tmp-1686177373.2244887-17139-101563472711239=\"` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `\" )` returned empty string"
}
Seems odd that the command has '/bin/sh', '-c' twice. Running the same docker command at the command line also doesn't work-it just hangs with no output--but running the command again without the duplicate /bin/sh -c works as I'd expect.
This used to work.
Do you have a bit more precise information with which versions of community.docker and ansible-core this used to work?
I'm trying to reproduce this locally:
---
- hosts: localhost
gather_facts: false
tasks:
- name: Create temporary container
community.docker.docker_container:
name: test
image: python:3.10-bullseye
state: started
auto_remove: true
command: /bin/bash
interactive: true
tty: true
- name: Add container to inventory
add_host:
name: "test"
ansible_connection: community.docker.docker
ansible_host: test
ansible_user: root
ansible_become: true
ansible_remote_tmp: /tmp/.ansible
ansible_python_interpreter: python
- name: ping
delegate_to: "test"
ansible.builtin.ping:
But it works fine for me (I'm running it with ANSIBLE_PIPELINING=false, otherwise it runs other, simpler commands):
<test> ESTABLISH DOCKER CONNECTION FOR USER: root
<test> EXEC ['/usr/bin/docker', b'exec', b'-u', 'root', b'-i', 'test', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo /tmp/.ansible `"&& mkdir "` echo /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559 `" && echo ansible-tmp-1686254840.6949074-18066-241769342060559="` echo /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559 `" ) && sleep 0\'']
Using module file /path/to/ansible/modules/ping.py
<test> PUT /path/to/.ansible/tmp/ansible-local-18034qycxoiwk/tmpvzp9a6hh TO /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559/AnsiballZ_ping.py
<test> EXEC ['/usr/bin/docker', b'exec', b'-u', 'root', b'-i', 'test', '/bin/sh', '-c', "/bin/sh -c 'chmod u+x /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559/ /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559/AnsiballZ_ping.py && sleep 0'"]
<test> EXEC ['/usr/bin/docker', b'exec', b'-u', 'root', b'-i', 'test', '/bin/sh', '-c', "/bin/sh -c 'python /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559/AnsiballZ_ping.py && sleep 0'"]
<test> EXEC ['/usr/bin/docker', b'exec', b'-u', 'root', b'-i', 'test', '/bin/sh', '-c', "/bin/sh -c 'rm -f -r /tmp/.ansible/ansible-tmp-1686254840.6949074-18066-241769342060559/ > /dev/null 2>&1 && sleep 0'"]
ok: [localhost -> test] => {
"changed": false,
"invocation": {
"module_args": {
"data": "pong"
}
},
"ping": "pong"
}
This could also be related to using -H=tcp://192.168.1.168:2375, i.e. the Docker version. In any case, I cannot reproduce the problem you are having.
This used to work.
Do you have a bit more precise information with which versions of community.docker and ansible-core this used to work?
Sorry. No. I wish I did.
I'll do some more tests.
For what it's worth, the Docker versions are 24.0.1 locally and 24.0.2 on the host.
Confusingly, running that command directly at the command prompt works fine. I'm pretty sure it didn't yesterday. But the playbook still fails for me.
/usr/bin/docker -H=tcp://192.168.1.168:2375 exec -u root -i tmp_data /bin/sh -c '/bin/sh -c "( umask 77 && mkdir -p \"` echo /tmp/.ansible `\" && mkdir \"` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `\" && echo ansible-tmp-1686177373.2244887-17139-101563472711239=\"` echo /tmp/.ansible/ansible-tmp-1686177373.2244887-17139-101563472711239 `\" ) && sleep 0"'
The playbook you posted does work on the machine hosting the docker container, but doesn't work from another machine. Even when adding ansible_docker_extra_args: "-H=tcp://192.168.1.168:2375" to the add_host task on either machine.
However, as I said, running that command directly from the other machine does work. Also running community.docker.docker_container with the docker_host argument also works (see below). So I don't think it's just a Docker problem. So what's different between the shell environment and the Ansible environment?
Also, I'm open to a better way of connecting to the remote Docker containers than -H=tcp://192.168.1.168:2375.
---
- hosts: localhost
gather_facts: false
tasks:
- name: Create temporary container
community.docker.docker_container:
docker_host: tcp://192.168.1.168:2375
name: test
image: python:3.10-bullseye
state: started
auto_remove: true
command: /bin/bash
interactive: true
tty: true
- name: Add container to inventory
add_host:
name: "test"
ansible_connection: community.docker.docker
ansible_host: test
ansible_user: root
ansible_become: true
ansible_remote_tmp: /tmp/.ansible
ansible_python_interpreter: python
ansible_docker_extra_args: "-H=tcp://192.168.1.168:2375"
- name: ping
delegate_to: "test"
ansible.builtin.ping:
The playbook you posted does work on the machine hosting the docker container, but doesn't work from another machine. Even when adding
ansible_docker_extra_args: "-H=tcp://192.168.1.168:2375"to theadd_hosttask on either machine.
What are the differences between these two machines with regard to Python version, ansible-core version, community.docker version, Docker version, ...?
I never figured this out, but instead I worked around it. Thanks for your help. Instead, I changed it to connect to the docker host via SSH instead:
ansible_docker_extra_args: "-H=ssh://{{ ansible_user }}@{{ ansible_host }}"
In fact, I think this is an all-round better solution anyway, since it's both more secure and easier. You're using the existing SSH connection, rather than configuring Docker to open a new port, and it uses SSH for security rather than allowing open access. I don't know why the Docker documentation doesn't recommend this approach more prominently for remote access (e.g. https://docs.docker.com/config/daemon/remote-access/). I don't know if it's worth adding something to the docs here.
I have the same problem when connecting to a remote docker engine, it happens because the exec call checks the command output too soon before it is executed.
I did an analysis of the packets with wireshark and the response from the docker engine is that the command is still running, possible it goes into error because the return code is null and not zero, it would be enough to introduce a delay or repeat the call when the command is still running.
` POST /v1.41/containers/test/exec HTTP/1.1 Host: 10.0.10.17:2375 User-Agent: Docker-Client/20.10.25+azure-2 (linux) Content-Length: 533 Content-Type: application/json
{"User":"","Privileged":false,"Tty":false,"AttachStdin":true,"AttachStderr":true,"AttachStdout":true,"Detach":false,"DetachKeys":"","Env":null,"WorkingDir":"","Cmd":["/bin/sh","-c","/bin/sh -c '( umask 77 \u0026\u0026 mkdir -p "echo ~/.ansible/tmp"\u0026\u0026 mkdir "echo ~/.ansible/tmp/ansible-tmp-1693931639.118705-877823-209799273796602" \u0026\u0026 echo ansible-tmp-1693931639.118705-877823-209799273796602="echo ~/.ansible/tmp/ansible-tmp-1693931639.118705-877823-209799273796602" ) \u0026\u0026 sleep 0'"]}
HTTP/1.1 201 Created
Api-Version: 1.42
Content-Type: application/json
Docker-Experimental: false
Ostype: linux
Server: Docker/23.0.6 (linux)
Date: Tue, 05 Sep 2023 16:33:59 GMT
Content-Length: 74
{"Id":"37620a287a45af02f0481d8ef439eee6b3d93483e07c208f2db841400545a8b5"} GET /v1.41/exec/37620a287a45af02f0481d8ef439eee6b3d93483e07c208f2db841400545a8b5/json HTTP/1.1 Host: 10.0.10.17:2375 User-Agent: Docker-Client/20.10.25+azure-2 (linux)
HTTP/1.1 200 OK Api-Version: 1.42 Content-Type: application/json Docker-Experimental: false Ostype: linux Server: Docker/23.0.6 (linux) Date: Tue, 05 Sep 2023 16:33:59 GMT Content-Length: 682
{"ID":"37620a287a45af02f0481d8ef439eee6b3d93483e07c208f2db841400545a8b5","Running":true,"ExitCode":null,"ProcessConfig":{"tty":false,"entrypoint":"/bin/sh","arguments":["-c","/bin/sh -c '( umask 77 && mkdir -p "echo ~/.ansible/tmp"&& mkdir "echo ~/.ansible/tmp/ansible-tmp-1693931639.118705-877823-209799273796602" && echo ansible-tmp-1693931639.118705-877823-209799273796602="echo ~/.ansible/tmp/ansible-tmp-1693931639.118705-877823-209799273796602" ) && sleep 0'"],"privileged":false},"OpenStdin":true,"OpenStderr":true,"OpenStdout":true,"CanRemove":false,"ContainerID":"9f4c334008a42c6d101143ebc9274fc3fa58c8db1603c3f4524f62878f5cfed6","DetachKeys":"","Pid":0}
`
SUMMARY
When executing a task delegated to a host using the community.docker.docker connection, the task fails with the failed to resolve remote temporary directory from [...] returned empty string
ISSUE TYPE
- Bug Report
COMPONENT NAME
community.docker
ANSIBLE VERSION
ansible [core 2.15.5]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.12/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible
python version = 3.12.0 (main, Oct 11 2023, 23:03:18) [GCC 12.2.0] (/usr/local/bin/python)
jinja version = 3.1.2
libyaml = True
DOCKER VERSION
Docker Host
Docker version 24.0.6, build ed223bc
Ansible Container
Docker version 20.10.24+dfsg1, build 297e128
I have tried to upgrade the binary in the Ansible container to version 24, but the issue still persists.
COLLECTION VERSION
ansible-galaxy collection list community.docker
# /usr/local/lib/python3.12/site-packages/ansible_collections
Collection Version
---------------- -------
community.docker 3.4.9
CONFIGURATION
CONFIG_FILE() = None
OS / ENVIRONMENT
Docker Desktop
Windows 11 Version 22H2 (OS Build 22621.2283)
Docker Desktop Version 4.24.1 (123237)
Engine: 24.0.6
Ansible Container
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Target Container
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.18.4
PRETTY_NAME="Alpine Linux v3.18"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"
STEPS TO REPRODUCE
Create the following 3 files in WSL.
/Dockerfile
FROM python:3.12-slim AS ansible
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends \
docker.io \
; \
pip install --upgrade pip; \
pip install --no-cache-dir \
boto \
boto3 \
ansible \
;
/docker-compose.yaml
version: "3.8"
services:
ansible:
stdin_open: true
tty: true
build:
target: ansible
stop_signal: SIGKILL
container_name: "ansible"
volumes:
- ./task.yaml:/task.yaml
alpine:
stdin_open: true
tty: true
image: "alpine:latest"
container_name: "alpine"
stop_signal: SIGKILL
/task.yaml
---
- hosts: localhost
connection: local
vars:
docker_extra_args: "-H=tcp://host.docker.internal:2375"
user: root
tasks:
- name: Add container to inventory
ansible.builtin.add_host:
name: alpine
ansible_connection: community.docker.docker
ansible_docker_extra_args: "{{docker_extra_args}}"
- name: "Delegating command to docker container"
delegate_to: alpine
ansible.builtin.command: "echo 'test'"
Start the docker containers, then execute the task in the ansible container
$ docker compose up --detach
$ docker exec -u root -i ansible /bin/sh -c "ansible-playbook task.yaml"
EXPECTED RESULTS
The expected result would be the task completing successfully. Docker Desktop version 4.18.0 is the last version in which this would work. Since the bug appeared after a Docker Desktop update, you might think the issue is related to docker, but it's not the case, please see my analysis.
ACTUAL RESULTS
ansible-playbook [core 2.15.5]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/local/lib/python3.12/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/local/bin/ansible-playbook
python version = 3.12.0 (main, Oct 11 2023, 23:03:18) [GCC 12.2.0] (/usr/local/bin/python)
jinja version = 3.1.2
libyaml = True
No config file found; using defaults
setting up inventory plugins
Loading collection ansible.builtin from
host_list declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
Skipping due to inventory source not existing or not being readable by the current user
script declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
auto declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
Skipping due to inventory source not existing or not being readable by the current user
yaml declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
Skipping due to inventory source not existing or not being readable by the current user
ini declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
Skipping due to inventory source not existing or not being readable by the current user
toml declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
Loading callback plugin default of type stdout, v2.0 from /usr/local/lib/python3.12/site-packages/ansible/plugins/callback/default.py
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
PLAYBOOK: task.yaml ****************************************************************************************************************************************************************************************************
Positional arguments: task.yaml
verbosity: 4
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('/etc/ansible/hosts',)
forks: 5
1 plays in task.yaml
PLAY [localhost] *******************************************************************************************************************************************************************************************************
TASK [Gathering Facts] *************************************************************************************************************************************************************************************************
task path: /task.yaml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276 `" && echo ansible-tmp-1697208460.66339-603-211097265800276="` echo /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276 `" ) && sleep 0'
Using module file /usr/local/lib/python3.12/site-packages/ansible/modules/setup.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-599xz7g9c5e/tmp0b6q65ax TO /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276/AnsiballZ_setup.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276/ /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276/AnsiballZ_setup.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/local/bin/python /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276/AnsiballZ_setup.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1697208460.66339-603-211097265800276/ > /dev/null 2>&1 && sleep 0'
ok: [localhost]
TASK [Add container to inventory] **************************************************************************************************************************************************************************************
task path: /task.yaml:9
creating host via 'add_host': hostname=alpine
changed: [localhost] => {
"add_host": {
"groups": [],
"host_name": "alpine",
"host_vars": {
"ansible_connection": "community.docker.docker",
"ansible_docker_extra_args": "-H=tcp://host.docker.internal:2375"
}
},
"changed": true
}
TASK [Delegating command to docker container] **************************************************************************************************************************************************************************
task path: /task.yaml:15
Loading collection community.docker from /usr/local/lib/python3.12/site-packages/ansible_collections/community/docker
<alpine> ESTABLISH DOCKER CONNECTION FOR USER: root
<alpine> EXEC ['/usr/bin/docker', '-H=tcp://host.docker.internal:2375', b'exec', b'-u', 'root', b'-i', 'alpine', '/bin/sh', '-c', "/bin/sh -c 'echo ~root && sleep 0'"]
<alpine> EXEC ['/usr/bin/docker', '-H=tcp://host.docker.internal:2375', b'exec', b'-u', 'root', b'-i', 'alpine', '/bin/sh', '-c', '/bin/sh -c \'echo "`pwd`" && sleep 0\'']
<alpine> EXEC ['/usr/bin/docker', '-H=tcp://host.docker.internal:2375', b'exec', b'-u', 'root', b'-i', 'alpine', '/bin/sh', '-c', '/bin/sh -c \'( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1697208461.9177275-648-193093839347660 `" && echo ansible-tmp-1697208461.9177275-648-193093839347660="` echo ~/.ansible/tmp/ansible-tmp-1697208461.9177275-648-193093839347660 `" ) && sleep 0\'']
fatal: [localhost -> alpine]: FAILED! => {
"msg": "failed to resolve remote temporary directory from ansible-tmp-1697208461.9177275-648-193093839347660: `( umask 77 && mkdir -p \"` echo ~/.ansible/tmp `\"&& mkdir \"` echo ~/.ansible/tmp/ansible-tmp-1697208461.9177275-648-193093839347660 `\" && echo ansible-tmp-1697208461.9177275-648-193093839347660=\"` echo ~/.ansible/tmp/ansible-tmp-1697208461.9177275-648-193093839347660 `\" )` returned empty string"
}
PLAY RECAP *************************************************************************************************************************************************************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Debugging
@calillo had the idea of analysing the traffic exchanged between the docker binary and the docker engine. I did the same to try to figure out what is happening and discovered some additional key information.
Docker Binary Behavior
According to my findings, when using the docker exec command, 2 connections will be made to the Docker Engine.
First Connection
- Ping the engine
- Get information about the container
- Start the exec, get an Id
- Wait, the second connection is initiated, then wake up when it's done
- Get the summary of the exec
Second Connection
- Connect with the Id provided in step 3 of the first connection
- Upgrade the HTTP connection to TCP in order to stream the output of the executed task
- When the exec finishes, the connection is closed
Comparing against functionning version
In order to find out what is happening, I've sniffed packets from 3 differents scenarios:
- Docker Engine v1.43 (Docker Desktop 4.24.1) via Ansible
- Docker Engine v1.43 (Docker Desktop 4.24.1) via Bash
- Docker Engine v1.41 (Docker Desktop 4.18.0) via Ansible
We'll concentrate on the second connection where we are expecting some output.
Docker Engine v1.43 (Docker Desktop 4.24.1) via Ansible
When inspecting the packets of the second connection, something is weird: the FIN flag is initiated by the binary. How does it know that the connection needs to be closed?
- 172.26.0.2 Docker Binary in the ansible container
- 192.168.65.254 Docker Engine on the host
After the second connection closes, the first connection queries the API to get the summary of the exec command. We can see that it was prematurely closed since it's still running and the exit code is still null.
Docker Engine v1.43 (Docker Desktop 4.24.1) via Bash
That same command that fails in Ansible, if we run in bash within the ansible container (so that it runs from the same environement), it executes successufly with the expected ouput.
# From WSL
$ docker exec -i ansible bash
# Within Ansible container
$ docker -H=tcp://host.docker.internal:2375 exec -u root -i alpine /bin/sh -c "/bin/sh -c '( umask 77 && mkdir -p "` echo ~/.ansible/tmp `"&& mkdir "` echo ~/.ansible/tmp/ansible-tmp-1697223552.034312-223-168432577105606 `" && echo ansible-tmp-1697223552.034312-223-168432577105606="` echo ~/.ansible/tmp/ansible-tmp-1697223552.034312-223-168432577105606 `" ) && sleep 0'"
ansible-tmp-1697223552.034312-223-168432577105606=/root/.ansible/tmp/ansible-tmp-1697223552.034312-223-168432577105606
$
When inspecting the packets, there is a big difference: the FIN flag is initiated by the Engine once it finishes streaming the command. This is the behaviour I would expect.
- 172.26.0.2 Docker Binary in the ansible container
- 192.168.65.254 Docker Engine on the host
Docker Engine v1.41 (Docker Desktop 4.18.0) via Ansible
Why did it work with the previous docker engine? Let's inspect the packet flow. We get the same behaviour has with the newest engine, but luckily we get the output in between the FIN request and the FIN confirmation.
- 172.19.0.3 Docker Binary in the ansible container
- 192.168.65.2 Docker Engine on the host
Since we managed to get the expected response from the exec command, when the first connection queries the API about the summary of the command, running is false and the exit code is 0.
Conclusion
From my understanding, this is not a new bug. It's been dormant and the changes to the engine caused it to surface.
The biggest question is what causes Ansible to close the second connection prematurely? Maybe the handle where the socket is piped is closed, thus cascading upward?
I cannot reproduce this. I guess the main difference is that I use Linux, not WSL. My Docker client in the container is exactly the same as you have, and my Docker daemon also is 24.0.6, though a slightly different build:
Server:
Engine:
Version: 24.0.6
API version: 1.43 (minimum version 1.12)
Go version: go1.21.1
Git commit: 1a7969545d
Built: Sat Sep 30 15:48:58 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.7.7
GitCommit: 8c087663b0233f6e6e2f4515cee61d49f14746a8.m
runc:
Version: 1.1.9
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0
(I'm using Arch Linux with Docker from the system packages.)
In my case the docker engine is on a remote VM (not in localhost), in this scenario it practically always happens. I confirm @jfjauvin consideration, if try to run on bash the same command work successeful. So, from my point of view, it's a session/connection management issue by community.docker
I did now try this with a Docker daemon on a remote machine reached via TCP. Same result: it works flawlessly for me.
The connection plugin also isn't doing any magic (assuming you don't use become: true with a plugin that needs to handle a prompt): it will simply execute the docker command line with subprocess.Popen (https://github.com/ansible-collections/community.docker/blob/main/plugins/connection/docker.py#L300-L305) and then uses Popen.communicate to wait for it to complete (https://github.com/ansible-collections/community.docker/blob/main/plugins/connection/docker.py#L343).
I have also tested it on Archlinux, and like @felixfontein it works. I took the opportunity to packet sniff the exchange and we are getting the same weird behavior of Ansible or the Binary sending the FIN flag as soon as the connection is upgraded.
- 172.22.0.2 Docker Binary in the ansible container
- 172.11.0.1 Docker Engine on the host
As @felixfontein mentioned, there's not at lot happening in the connection plugin, but the binary behaves differently has it does in a shell.
When I have some time later this week, I'll try to isolate running the docker binary in a simple python script using subprocess.Popen and see if we get the same behavior. In the past, I've seen some binaries behave different when they weren't running a in a terminal (TTY).
@jfjauvin did you had a chance to try this out?