ansible-runner icon indicating copy to clipboard operation
ansible-runner copied to clipboard

Clean up artifacts/N/ssh_key_data fifo on failure of containerization

Open john-westcott-iv opened this issue 2 years ago • 2 comments

If we are running in containerization mode we run a command like:

/usr/bin/podman run --rm --tty --interactive --workdir /runner/project -v /tmp/awx_974_6lpq08hx/:/runner/:Z -v /awx_devel/ansible-runner/ansible_runner/display_callback/callback/:/home/runner/.ansible/plugins/callback/:Z --env-file /tmp/awx_974_6lpq08hx/artifacts/974/env.list --quiet --name ansible_runner_974 --user=root --pull=always quay.io/ansible/junk:latest ssh-agent sh -c "trap 'rm -f /runner/artifacts/974/ssh_key_data' EXIT && ssh-add /runner/artifacts/974/ssh_key_data && rm -f /runner/artifacts/974/ssh_key_data && ansible-playbook -u ans1 -i /runner/inventory/hosts -e @/runner/env/extravars find_files.yml" ; rm -f /tmp/awx_974_6lpq08hx/artifacts/974/ssh_key_data

But if the podman command fails (in the case above because the specified container is invalid) the trap is never executed and the artifacts/N/ssh_key-data fifo pip is not cleaned up. Leaving this fifo around can cause downstream issues because a read of the fifo will just hang.

AWX has a work around https://github.com/ansible/awx/blob/3b6cd1828322f8e7c6b0a6824526c6b9604871a2/awx/main/tasks/callback.py#L189-L203 that can be cleaned up once this fixed.

john-westcott-iv avatar Mar 02 '22 16:03 john-westcott-iv

Gotta admit, I'm confused by this. Hoping @shanemcd can provide some insight/solution here.

Shrews avatar Apr 05 '22 13:04 Shrews

Below is some code to replicate this issue. Note that running this code will create a directory at the same location of the code being executed if you want to change this modify line 61:

base_path = os.path.dirname(os.path.realpath(__file__))

The directory it creates is called ansible-runner-testing and is the private_data_dir we give to runner. Inside this directory it creates a project directory with a small sample playbook. You shouldn't have to create anything else for this code to run.

When the code executes we get the following high level sequence of events which causes the problem:

  1. Runner starts
  2. Runner calls ansible_runner.utils.open_fifo_write to create the ssh_key_data fifo
  3. Runner attempts to start podman but there is a bad image name so podman fails
  4. Runner exits reporting a status of failed and a return code of 125 but leaving the fifo pipe (and its blocked thread waiting for a read)

If we look at ansible_runner.utils.open_fifo_write we have the comment:

    '''open_fifo_write opens the fifo named pipe in a new thread.
    This blocks the thread until an external process (such as ssh-agent)
    reads data from the pipe.
    '''

However, ssh-agent runs inside of podman which failed to launch because of an invalid container name. In theory, if anything else prevented podman from running we would get the same issue. I used an invalid container name because it was a simple way to induce the issue.

Near the bottom of the sample code, on. lines 100-105 you will see something critical:

        #
        # If we don't read from the FIFO pipe here the python process just hangs 
        #
        with open(ssh_key_data_file, 'r') as f:
            f.read()

As the comment states, if those lines are removed the python interpreter will be unable to stop because of the blocked thread waiting for a read of the fifo. By adding these lines to our calling code we read the fifo which unblocks the spawned thread. This is similar to what AWX is doing as a work around for this issue. We should be able to fix this issue with something similar (here is some meta code):

run podman
if "the fifo read thread is still alive":
    with open(<my fifo pipe>, 'r') as f:
        f.read()

Here is the replication code.

#!/usr/bin/env python
from ansible_runner.interface import run
import os, stat

playbook_contents = '''
---
- name: Run a debug task
  hosts: localhost
  connection: local
  gather_facts: False
  tasks:
    - debug:
        msg: "hello, my name is human"
'''

ssh_key = '''
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAYEA1DEZKIUft80Fys59XZLSWDgOaBHCRMQpbSFUWO5wXvIJ1mijdeN/
1+ncmfGrAaNMSTLfeLWoXiuO0Q7Zm0FH79N0hzCKVg26sPUOtfNkVykM5sLywciaj3M/Lv
XVdZfOP27sJRIXxhxyqOxLNOJOwDt9GdrLfTkS132mFO5sTZZC7fc1ZH3AA75JabrQa53W
njAck1aSz6y1zMyRO0m68rVgeuyBHc3Ayk6dZ7Ne741fd1LAFvZaYc1G52gXZ0zzslRXcZ
Isx6t2OzA1LDSDUTFirY9WFj1Ni5mQHfO6bofPRqIVtf2TlsLmR1oyDnM10j0KLah5wSgk
imK5Vg2eGuVGdhXIZs/NF9X8i5OGgJshKwwfIsGSZU28NaCqzdWdqwfNk3INh5qZDVbM0+
pLfaAplt2XIURvPSCBXSTDQP91PRPxHU200llDvwy74hz6iW6Py7lUNGib0vY13JQ4xNan
3cnTSvrweYIladof60gDjQI43JAkNR5OsoDWVxNdAAAFiP7UZcH+1GXBAAAAB3NzaC1yc2
EAAAGBANQxGSiFH7fNBcrOfV2S0lg4DmgRwkTEKW0hVFjucF7yCdZoo3Xjf9fp3JnxqwGj
TEky33i1qF4rjtEO2ZtBR+/TdIcwilYNurD1DrXzZFcpDObC8sHImo9zPy711XWXzj9u7C
USF8YccqjsSzTiTsA7fRnay305Etd9phTubE2WQu33NWR9wAO+SWm60Gud1p4wHJNWks+s
tczMkTtJuvK1YHrsgR3NwMpOnWezXu+NX3dSwBb2WmHNRudoF2dM87JUV3GSLMerdjswNS
w0g1ExYq2PVhY9TYuZkB3zum6Hz0aiFbX9k5bC5kdaMg5zNdI9Ci2oecEoJIpiuVYNnhrl
RnYVyGbPzRfV/IuThoCbISsMHyLBkmVNvDWgqs3VnasHzZNyDYeamQ1WzNPqS32gKZbdly
FEbz0ggV0kw0D/dT0T8R1NtNJZQ78Mu+Ic+oluj8u5VDRom9L2NdyUOMTWp93J00r68HmC
JWnaH+tIA40CONyQJDUeTrKA1lcTXQAAAAMBAAEAAAGAbM3kTZ+9fNz3XF/3brTrgOr1kC
2TBPnlGB1gB93z5uuLWdL8BmYqVseHI7UE7+kRI/OfHWFgvTDuoYpQ0Mhvn6048l1Ugf22
PijaazgrunnFMMrD+dPoVShOPME3vH2j92jkU7nsUNyjerT3d12X7gBTDJ/hegt6+t5l/B
OpmzsyhJ0fVrEYdZLsR/fQCW687w5SiMZ6r1yrOCGpUtJitzOEDmQZbPPlKJeZJOSYCJPg
YOogr2WRQNcmxV/dqE2PNMRYEQW/MM8kc7xKiwbCyDI1ufn01Qw21UkiQO0Iytw3iYARa6
N+XiDEBCjKPg1GeqOQ7e0UKxXdtHSgKFBzE9wjJStUSjxDIwvdPLdxeUhvBBf0iXqnChw3
BV6sWv+pfHNdXPEjUaq8oXaH85VJ7Hm64yGiTYt2ThbKtYDiEkGweVdaWyQQ2COdzxdeB7
fS+v6imamYDioRHQc/efYtnB+XAkXNriRgPVWyw4Lp52Eo/WqMaJucxegV7aafJCYtAAAA
wQCEshmTZs/ahR74uAflgKegE6dzQE+h2Wsi0M3aaDv9G+HkK40JkYkmFrm2tySDMYUBYX
b9xphUGTt37oCWoV2g7Uhjv37OXmDrufO1ePh3HI+L6oKWhvQLRD6owZ9Y2SALU3I3dknp
AapRJAqXkgnyiECpyjHbmSToplq2D8sVWoxvD/e4vttNhmnAxYd7S8cHbhyhkzZg0S8heH
iSpHKuBx3S3cQ/O3ZVAfn/mGv0quIl6FCouX7SpUKI/Sz/SZkAAADBAPoHBC0LjElTmzDW
JX37oNn9kVlPYeiretYtH/E+E/cusBWxOcdqVu0QePXrBEWspsokQhuKH6XRZnh8M/QcNJ
x4pR5Zq048FB7gyQY2dsQrzMiTVy2QGJFe95eJZ8ndeIGB2v2pIy5AbmJL5Uq46WlUKBeB
RqSKwb8Vp01nOBePBTmSP3Nw9AWs9lm237qbsUBiViaO89EEjSFkLBfvZfCRdz5lmTp749
0N1AEEd1yHrrZiQMK9pyCXazdIgLmbHwAAAMEA2UK1CH+GgSys4HZ9Ozo8Baw2OCZWE2bX
/eC4tkOY1LiIfunIXR6Nptyynmwi4GrtKMMXRT4IvwhFnO64iY8lR9sHbZRSYQbdTr4+U4
t35dyfHrfiEZVzDq4bLt0xSJN8aMiVOahW2EgOft113j69t4s3bvKY6Mt5a3JCR6i0SEVO
lgfooXD5mNwwXb8hiaHqhRZj+7gCy+CR8SOisES/61IQLdFFHlKG+tsk0+/zyjdbxh7Q/U
MzSgNrflq9D34DAAAAEmpvd2VzdGNvQGF3eC1kZXZlbA==
-----END OPENSSH PRIVATE KEY-----
'''


#
# Create directories and paths as needed
#
base_path = os.path.dirname(os.path.realpath(__file__))
data_dir = os.path.join(base_path, 'ansible-runner-testing')
project_dir = os.path.join(data_dir, 'project')
os.makedirs(project_dir, mode=511, exist_ok=True)

with open(os.path.join(project_dir, 'play.yml'), 'w') as f:
    f.write(playbook_contents)

identity = "4"
artifact_dir = os.path.join(data_dir, 'artifacts', identity)
if os.path.exists(artifact_dir):
    import shutil
    shutil.rmtree(artifact_dir)


#
# Run ansible-runner via interface
#
res = run(**{
    "ident": identity,
    "private_data_dir": data_dir,
    "playbook": "play.yml",
    "ssh_key": ssh_key,
    "process_isolation": True,
    "container_image": "quay.io/ansible/junk:latest",
    "settings": {
        "job_timeout": 3
    }
})

print(res.status)
print(res.rc)

#
# Check for the fifo pipe
#
ssh_key_data_file = os.path.join(artifact_dir, 'ssh_key_data')
try:
    if stat.S_ISFIFO(os.stat(ssh_key_data_file).st_mode):
        print("GOT THE ERROR: The file {} exists and is a FIFO PIPE".format(ssh_key_data_file))
        #
        # If we don't read from the FIFO pipe here the python process just hangs 
        #
        with open(ssh_key_data_file, 'r') as f:
            f.read()
    else:
        print("The file {} exists but is not a pipe".format(ssh_key_data_file))
except FileNotFoundError:
    print("The file is (correctly) not there")

john-westcott-iv avatar Apr 08 '22 13:04 john-westcott-iv