Rocky Linux images resulting in PAM sudo error
When I'm running my geerlingguy/docker-rockylinux9-ansible containers in CI in GitHub Actions to test my Ansible projects, I have been seeing the following errors whenever running a task with sudo/`become:
TASK [Gathering Facts] *********************************************************
fatal: [instance]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3.9"}, "failed": true, "module_stderr": "sudo: PAM account management error: Authentication service cannot retrieve authentication info\nsudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE: No start of json char found\nSee stdout/stderr for the exact error", "rc": 1}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"}
Other users have reported the same, for both Rocky Linux 8 and 9, for the past few weeks. For example: https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6
[root@25c3908841c3 /]# sudo "hello world"
sudo: PAM account management error: Authentication service cannot retrieve authentication info
sudo: a password is required
This error is not reproducible on a Mac running Docker Desktop, but it is in instances running docker-ce or on GitHub Actions. We use sudo in the container because it is testing/verifying playbooks that are run against instances where sudo may be required.
In the past this was never an issue; it seems like it could be also related to the yum install sudo command that I run that updates PAM (perhaps?) in my Ansible/Docker project: https://github.com/geerlingguy/docker-rockylinux9-ansible/blob/master/Dockerfile#L22
Is there something that's changed in Rocky Linux lately that could be causing this?
Heya Jeff - Thanks for the detailed report.
I'll check this out -- nothing springs to mind but it's totally possible something has changed due to our use of kiwi to build the container root filesystems since 9.4.
Thanks for your patience. We had an outage with our powerpc cluster yesterday I had to work on.
The error here indicates that pam isn't able to resolve the user running the container to anything in its database(s). Would it be possible to get the contents of /etc/shadow and /etc/passwd on an affected instance, as well as the output of id -u?
This has been happening for a few weeks to a couple of months but...I can no longer reproduce this with any of the test cases I put into geerlingguy's repo (https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2564725736) besides a GitHub Actions runner. Going to go pull out that troubleshooting info you asked for, but wanted to add that new nuance.
Trying on a Ubuntu 22.04 cloud instance, leaving notes:
Don't see PAM errors in the rocky linux container now. I see docker-ce released a new minor version 2 days ago (27.5.0). But pinning back to the older docker-ce (5:27.4.1-1~ubuntu.24.04~noble and 5:27.4.0-1~ubuntu.24.04~noble) doesn't result in the error anymore either. The rockylinux image hasn't been updated since its last tag 2 months ago. I don't see new versions for pam, sudo, or the other dependencies (https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2561991276) previously listed, unless the version numbers were overwritten with new changes.
https://github.com/ansible/molecule/issues/4365 is reporting this same error running on GitHub Actions with registry.access.redhat.com/ubi9/ubi-init:latest
Looks like the Action runners use Docker-CE 26.x. My suspicion is this is from a kernel/syscall error on the Docker end.
(https://github.com/artis3n/ansible-role-tailscale/actions/runs/12797444601/job/35679327742?pr=532#step:7:15)
Client: Docker Engine - Community
Version: 26.1.3
API version: 1.45
Go version: go1.21.10
Git commit: b72abbb
Built: Thu May 16 08:33:35 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.1.3
API version: 1.45 (minimum version 1.24)
Go version: go1.21.10
Git commit: 8e96db1
Built: Thu May 16 08:33:35 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.24
GitCommit: 88bf19b2105c8b17560993bee28a01ddc2f97182
runc:
Version: 1.2.2
GitCommit: v1.2.2-0-g7cb3632
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Would it be possible to get the contents of /etc/shadow and /etc/passwd on an affected instance, as well as the output of id -u?
(https://github.com/artis3n/ansible-role-tailscale/actions/runs/12797497586/job/35679504452?pr=532#step:6:207)
id -u
cat /etc/shadow
cat /etc/passwd
0
root:!locked::0:99999:7:::
bin:*:19469:0:99999:7:::
daemon:*:19469:0:99999:7:::
adm:*:19469:0:99999:7:::
lp:*:19469:0:99999:7:::
sync:*:19469:0:99999:7:::
shutdown:*:19469:0:99999:7:::
halt:*:19469:0:99999:7:::
mail:*:19469:0:99999:7:::
operator:*:19469:0:99999:7:::
games:*:19469:0:99999:7:::
ftp:*:19469:0:99999:7:::
nobody:*:19469:0:99999:7:::
tss:!!:19680::::::
systemd-coredump:!!:20101::::::
dbus:!!:20101::::::
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/:/usr/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
Although I'm not sure why but we are seeing the same issue even when running the GitHub Action using Podman instead of Docker.
Heya folks - Have not forgotten about this but looks like it's just moving into other areas and doesn't feel deterministic.
For example, in the OpenStack-Ansible project, we've been having failures like this due to AppArmor when running CentOS Stream 9 containers on Ubuntu hosts, but not Rocky.
Has anyone seen any root cause analysis on this yet? I'm struggling to see common threads to look down.
I have not seen anything more, unfortunately :(
I haven't had time to dig any deeper.
It looks like /etc/shadow is being created without root read permissions.
(ansible task)
- name: Debugging
changed_when: false
register: thing
ansible.builtin.shell:
cmd: |
echo ""
ls -l /etc/shadow
ls -l /etc/passwd
echo ""
- name: Print debug
ansible.builtin.debug:
var: thing.stdout
@andtra realized that in https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2671053863
I can reproduce that as the source issue as well
FWIW this also seems to affect RHEL and Oracle Linux, but not AlmaLinux.
I'm not a rocky-linux user, but I did run into this issue in a different context, hunt around for a while, and come across this blog post: https://www.tunbury.org/2025/05/13/ubuntu-apparmor/
Update: https://github.com/AOSC-Tracking/apparmor/commit/556396a172d09ea032404c7b346f4cf54a949a4e