ansible-builder icon indicating copy to clipboard operation
ansible-builder copied to clipboard

Version 3 builder images cannot be used with ansible-navigator

Open cidrblock opened this issue 1 year ago • 9 comments

Using a builder built image:

---
version: 3

images:
  base_image:
    name: registry.fedoraproject.org/fedora:38  # vanilla image!

dependencies:
 
  ansible_core: 
    package_pip: ansible-core

  ansible_runner:  
    package_pip: ansible-runner

  galaxy:
    collections:
    - ansible.utils

  python:
  - ansible-pylibssh

When used with navigator:


(venv) x1 ➜  builder_test ansible-navigator run site.yml --eei test-ee:latest --mode stdout --pp never --ll debug --la false
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: error in 'jsonfile' cache plugin while trying to create cache dir
/runner/artifacts/0e1b05c3-b27c-410e-b72e-9ead558e4f40/fact_cache : b"[Errno
13] Permission denied: '/runner/artifacts'"
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'
ERROR! Invalid callback for stdout specified: awx_display
Please review the log for errors.
(venv) x1 ➜  builder_test 

I suspect the issue here is the Permission Denied, runner cannot copy it's awx_display callback plugin into the artificat directory.

The dir being mounted is 700:

drwx------ 4 bthornto bthornto   80 May 19 15:38 ansible-navigator_qg5o8i3t

and from within the ee it is inaccessable:

(venv) x1 ➜  builder_test ansible-navigator exec --eei test-ee:latest --pp never
bash-5.2$ ls -l /runner
ls: cannot open directory '/runner': Permission denied

The full invocation of the ee is as follows:

odman run --rm --tty --interactive -v 
    /home/bthornto/github/builder_test/:/home/bthornto/github/builder_test/ 
    --workdir /home/bthornto/github/builder_test 
    -v /run/user/1000/keyring/:/run/user/1000/keyring/ 
    -e SSH_AUTH_SOCK=/run/user/1000/keyring/ssh 
    -v /home/bthornto/.ssh/:/home/runner/.ssh/ 
    -v /home/bthornto/.ssh/:/root/.ssh/ 
    --group-add=root 
    --ipc=host 
    -v /tmp/ansible-navigator_qg5o8i3t/artifacts/:/runner/artifacts/:Z 
    -v /tmp/ansible-navigator_qg5o8i3t/:/runner/:Z 
    --env-file /tmp/ansible-navigator_qg5o8i3t/artifacts/067e94c2-4f4f-4162-9a98-16ba258e3189/env.list -
    -quiet --name ansible_runner_067e94c2-4f4f-4162-9a98-16ba258e3189 
    test-ee:latest ansible-playbook /home/bthornto/github/builder_test/site.yml

cidrblock avatar May 19 '23 23:05 cidrblock

This one is the problem: -v /tmp/ansible-navigator_qg5o8i3t/:/runner/:Z

/runner is the fallback workdir and homedir for ephemeral users- the container build forces the one in the EE image to be writable by the container GID0, and the entrypoint script bends over backwards to ensure the user has a valid and writeable homedir that's properly reflected in /etc/passwd, but if you're mounting over the top of it with something that's not, a lot of stuff is going to be broken. Why are we trashing the container user's homedir?

nitzmahone avatar May 19 '23 23:05 nitzmahone

I need to dig deeper, but this is as far as I got so far, we call run_command_async from ansible runner passing the /tmp/ansible-navigatorxxxxx as private_data_dir

{'container_image': 'ghcr.io/ansible/crea...ee:v0.17.0', 'process_isolation_executable': 'podman', 'process_isolation': True, 'container_volume_mounts': None, 'container_options': None, 'container_workdir': None, 'private_data_dir': '/tmp/ansible-navigat...r_9isqrv2v', 'json_mode': True, 'quiet': True, 'cancel_callback': <bound method Base.r...38fe2690>>, 'finished_callback': <bound method Base.r...38fe2690>>, 'timeout': None, 'envvars': {'ANSIBLE_NAVIGATOR_UP...T_FIXTURES': 'true'}, 'host_cwd': '/home/bthornto/githu...navigator/', ...}

I'll debug runner after a bit.

cidrblock avatar May 20 '23 00:05 cidrblock

Rootless podman assumes host UID == container UID0/GID0, but since we "can't" default the container to USER root, permissions on host-shared dirs are problematic. I see a few options:

  1. Just add -u root to the container invocation- that's the only way that file ownership of things created in the container will always be "correct" on the host with rootless podman (eg owned by the real host UID/GID on the host filesystem). Without that, or a manual uidmap that basically does the same thing, files created in the container will be owned by some random UID on the host.

  2. Ensure that host-mapped dirs that need to be writable inside the container have "current host user" group ownership, and are group-writable, and set the container umask to 0002. This will still have the unfortunate side-effect of container-created files and dirs being owned by a random UID on the host, but they should still be accessible since the group is the host user's GID and the umask creates them writable by that user by default.

  3. Launch the container with --userns=keep-id . This interposes the host UID at the namespace inside the container so it's actually the same inside and out, but it still keeps the primary group as container GID0 (so you'll see ephemeral group ownership on the host for files created in the container instead of your own GID). There might be a way around that one, but it's not immediately clear to me.

I think option #3 is the most compatible choice for Navigator's typical host-agility needs as a dev tool.

nitzmahone avatar May 20 '23 01:05 nitzmahone

Any of that would need to be done by runner.... Navigator doesn't craft the command line.

Will look at the runner code later tonight.

cidrblock avatar May 20 '23 01:05 cidrblock

ansible-navigator run site.yml --eei test-ee --pp never --co="-uroot" --la false 

this works fine for podman but not for docker

ansible-navigator run site.yml --eei test-ee --pp never --co="--userns=keep-id" --la false

works fine for podman, until docker is installed, "OCI permission denied", appears to be the docker group membership

So the last question is, should this be fixed in runner? or in navigator? It appears -u root is the better option given the issues related to having docker and podman installed at the same time.

cidrblock avatar May 20 '23 04:05 cidrblock

it appears userns was once in runner: https://github.com/ansible/ansible-runner/pull/759

cidrblock avatar May 20 '23 04:05 cidrblock

Apparently Docker also has a rootless mode (https://docs.docker.com/engine/security/rootless/), which will make this even more ... interesting :)

felixfontein avatar May 20 '23 06:05 felixfontein

I went ahead and merged the navigator PR and released version 3.3.1, the tests were passing and I had good success with it locally.

I still find myself thinking container engine specific CLI requirements related to builder built execution environments might be better in runner, but I also understand touching runner can have a much bigger impact than navigator.

cidrblock avatar May 20 '23 12:05 cidrblock

Thanks @cidrblock I confirm that ansible-navigator 3.3.1 fixed the issue.

laurent-indermuehle avatar May 30 '23 11:05 laurent-indermuehle