sig-cloud-instance-images icon indicating copy to clipboard operation
sig-cloud-instance-images copied to clipboard

Rocky Linux images resulting in PAM sudo error

Open geerlingguy opened this issue 11 months ago • 10 comments

When I'm running my geerlingguy/docker-rockylinux9-ansible containers in CI in GitHub Actions to test my Ansible projects, I have been seeing the following errors whenever running a task with sudo/`become:

  TASK [Gathering Facts] *********************************************************
  fatal: [instance]: FAILED! => {"ansible_facts": {}, "changed": false, "failed_modules": {"ansible.legacy.setup": {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3.9"}, "failed": true, "module_stderr": "sudo: PAM account management error: Authentication service cannot retrieve authentication info\nsudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE: No start of json char found\nSee stdout/stderr for the exact error", "rc": 1}}, "msg": "The following modules failed to execute: ansible.legacy.setup\n"}

Other users have reported the same, for both Rocky Linux 8 and 9, for the past few weeks. For example: https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6

[root@25c3908841c3 /]# sudo "hello world"
sudo: PAM account management error: Authentication service cannot retrieve authentication info
sudo: a password is required

This error is not reproducible on a Mac running Docker Desktop, but it is in instances running docker-ce or on GitHub Actions. We use sudo in the container because it is testing/verifying playbooks that are run against instances where sudo may be required.

In the past this was never an issue; it seems like it could be also related to the yum install sudo command that I run that updates PAM (perhaps?) in my Ansible/Docker project: https://github.com/geerlingguy/docker-rockylinux9-ansible/blob/master/Dockerfile#L22

Is there something that's changed in Rocky Linux lately that could be causing this?

geerlingguy avatar Jan 13 '25 15:01 geerlingguy

Heya Jeff - Thanks for the detailed report.

I'll check this out -- nothing springs to mind but it's totally possible something has changed due to our use of kiwi to build the container root filesystems since 9.4.

NeilHanlon avatar Jan 13 '25 16:01 NeilHanlon

Thanks for your patience. We had an outage with our powerpc cluster yesterday I had to work on.

The error here indicates that pam isn't able to resolve the user running the container to anything in its database(s). Would it be possible to get the contents of /etc/shadow and /etc/passwd on an affected instance, as well as the output of id -u?

NeilHanlon avatar Jan 15 '25 18:01 NeilHanlon

This has been happening for a few weeks to a couple of months but...I can no longer reproduce this with any of the test cases I put into geerlingguy's repo (https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2564725736) besides a GitHub Actions runner. Going to go pull out that troubleshooting info you asked for, but wanted to add that new nuance.

Trying on a Ubuntu 22.04 cloud instance, leaving notes:

Don't see PAM errors in the rocky linux container now. I see docker-ce released a new minor version 2 days ago (27.5.0). But pinning back to the older docker-ce (5:27.4.1-1~ubuntu.24.04~noble and 5:27.4.0-1~ubuntu.24.04~noble) doesn't result in the error anymore either. The rockylinux image hasn't been updated since its last tag 2 months ago. I don't see new versions for pam, sudo, or the other dependencies (https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2561991276) previously listed, unless the version numbers were overwritten with new changes.

artis3n avatar Jan 15 '25 21:01 artis3n

https://github.com/ansible/molecule/issues/4365 is reporting this same error running on GitHub Actions with registry.access.redhat.com/ubi9/ubi-init:latest

artis3n avatar Jan 15 '25 21:01 artis3n

Looks like the Action runners use Docker-CE 26.x. My suspicion is this is from a kernel/syscall error on the Docker end.

(https://github.com/artis3n/ansible-role-tailscale/actions/runs/12797444601/job/35679327742?pr=532#step:7:15)

Client: Docker Engine - Community
 Version:           26.1.3
 API version:       1.45
 Go version:        go1.21.10
 Git commit:        b72abbb
 Built:             Thu May 16 08:33:35 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          26.1.3
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.10
  Git commit:       8e96db1
  Built:            Thu May 16 08:33:35 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.24
  GitCommit:        88bf19b2105c8b17560993bee28a01ddc2f97182
 runc:
  Version:          1.2.2
  GitCommit:        v1.2.2-0-g7cb3632
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Would it be possible to get the contents of /etc/shadow and /etc/passwd on an affected instance, as well as the output of id -u?

(https://github.com/artis3n/ansible-role-tailscale/actions/runs/12797497586/job/35679504452?pr=532#step:6:207)

id -u
cat /etc/shadow
cat /etc/passwd
0


root:!locked::0:99999:7:::
bin:*:19469:0:99999:7:::
daemon:*:19469:0:99999:7:::
adm:*:19469:0:99999:7:::
lp:*:19469:0:99999:7:::
sync:*:19469:0:99999:7:::
shutdown:*:19469:0:99999:7:::
halt:*:19469:0:99999:7:::
mail:*:19469:0:99999:7:::
operator:*:19469:0:99999:7:::
games:*:19469:0:99999:7:::
ftp:*:19469:0:99999:7:::
nobody:*:19469:0:99999:7:::
tss:!!:19680::::::
systemd-coredump:!!:20101::::::
dbus:!!:20101::::::


root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/:/usr/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin

artis3n avatar Jan 15 '25 21:01 artis3n

Although I'm not sure why but we are seeing the same issue even when running the GitHub Action using Podman instead of Docker.

RanabirChakraborty avatar Jan 16 '25 13:01 RanabirChakraborty

Heya folks - Have not forgotten about this but looks like it's just moving into other areas and doesn't feel deterministic.

For example, in the OpenStack-Ansible project, we've been having failures like this due to AppArmor when running CentOS Stream 9 containers on Ubuntu hosts, but not Rocky.

Has anyone seen any root cause analysis on this yet? I'm struggling to see common threads to look down.

NeilHanlon avatar Feb 11 '25 17:02 NeilHanlon

I have not seen anything more, unfortunately :(

I haven't had time to dig any deeper.

geerlingguy avatar Feb 11 '25 19:02 geerlingguy

It looks like /etc/shadow is being created without root read permissions.

(ansible task)

    - name: Debugging
      changed_when: false
      register: thing
      ansible.builtin.shell:
        cmd: |
          echo ""
          ls -l /etc/shadow
          ls -l /etc/passwd
          echo ""

    - name: Print debug
      ansible.builtin.debug:
        var: thing.stdout

Image

@andtra realized that in https://github.com/geerlingguy/docker-rockylinux9-ansible/issues/6#issuecomment-2671053863

I can reproduce that as the source issue as well

FWIW this also seems to affect RHEL and Oracle Linux, but not AlmaLinux.

artis3n avatar Feb 22 '25 15:02 artis3n

I'm not a rocky-linux user, but I did run into this issue in a different context, hunt around for a while, and come across this blog post: https://www.tunbury.org/2025/05/13/ubuntu-apparmor/

Update: https://github.com/AOSC-Tracking/apparmor/commit/556396a172d09ea032404c7b346f4cf54a949a4e

rocodes avatar Jul 16 '25 23:07 rocodes