awx icon indicating copy to clipboard operation
awx copied to clipboard

ERROR [-] awx.main.tasks.jobs project_update xxx (running) Post run hook errored. (OSError: [Errno 39]) | (FileNotFoundError: [Errno 2])

Open acsezen opened this issue 10 months ago • 2 comments

Please confirm the following

  • [x] I agree to follow this project's code of conduct.
  • [x] I have checked the current issues for duplicates.
  • [x] I understand that AWX is open source software provided for free and that I might not receive a timely response.
  • [x] I am NOT reporting a (potential) security vulnerability. (These should be emailed to [email protected] instead.)

Bug Summary

Hello team,

Occasionally, I encounter a Post run hook errored for project updates, even though most projects update successfully. Here are the tracebacks;

2025-02-19 03:01:00,357 ERROR    [-] awx.main.tasks.jobs project_update xxx (running) Post run hook errored.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 637, in run
    self.post_run_hook(self.instance, status)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 1410, in post_run_hook
    shutil.rmtree(stage_path)  # cannot trust content update produced
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 683, in _rmtree_safe_fd
    onerror(os.rmdir, fullname, sys.exc_info())
  File "/usr/lib64/python3.11/shutil.py", line 681, in _rmtree_safe_fd
    os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'ansible-windows-1.11.1-cgel_146

or

2025-02-19 09:00:41,404 ERROR    [ad623158921548589ea54af63b05b196] awx.main.tasks.jobs project_update xxx (running) Post run hook errored.
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 637, in run
    self.post_run_hook(self.instance, status)
  File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 1410, in post_run_hook
    shutil.rmtree(stage_path)  # cannot trust content update produced
    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/shutil.py", line 752, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
    _rmtree_safe_fd(dirfd, fullname, onerror)
  File "/usr/lib64/python3.11/shutil.py", line 703, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/usr/lib64/python3.11/shutil.py", line 701, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
FileNotFoundError: [Errno 2] No such file or directory: 'tmpvjbzpmia'

AWX version

24.6.1

Select the relevant components

  • [ ] UI
  • [ ] UI (tech preview)
  • [x] API
  • [ ] Docs
  • [ ] Collection
  • [ ] CLI
  • [ ] Other

Installation method

openshift

Modifications

no

Ansible version

2.15.2

Operating system

Kustomize Version: v5.0.1 Openshift Version: 4.14.34 Kubernetes Version: v1.27.15+6147456

Web browser

No response

Steps to reproduce

It happens randomly. Sometimes updates are successful.

Expected results

Project_update always run successfully.

Actual results

Project_update fails randomly.

Additional information

% oc rsh -n awx-production deployments/awx-prd-task ls -al /var/lib/awx
total 44
drwxrwxr-x. 1 root root 43 Feb 19 13:00 .
drwxr-xr-x. 1 root root 33 Feb 19 00:49 ..
-rw-------. 1 awx root 177 Feb 19 14:10 .bash_history
drwxr-xr-x. 3 root root 19 Jul 2 2024 .local
drwxrwxrwx. 160 99 99 36864 Jan 31 12:58 projects
drwxr-xr-x. 3 root root 20 Jul 2 2024 public
drwxrwxr-x. 1 root root 40 Jul 2 2024 rsyslog
drwxr-xr-x. 3 root root 17 Jul 2 2024 venv
oc rsh -n awx-production deployments/awx-prd-task ls -al /var/lib/awx/projects/ | head -n10
total 688
drwxrwxrwx. 160   99   99 36864 Jan 31 12:58 .
drwxrwxr-x.   1 root root    43 Feb 19 13:00 ..
drwxr-xr-x.   5 awx    99  4096 Nov 14 09:33 _517__xxx_xxx_xxx
-rwxr-xr-x.   1 awx    99     0 Nov 14 09:33 _517__xxx_xxx_xxx.lock
% oc rsh -n awx-production deployments/awx-prd-web ls -al /var/lib/awx
total 40
drwxrwxr-x. 1 root root 37 Feb 18 23:52 .
drwxr-xr-x. 1 root root 30 Jul 2 2024 ..
prw-------. 1 awx root 0 Feb 18 23:52 awxfifo
drwxr-xr-x. 3 root root 19 Jul 2 2024 .local
drwxrwxrwx. 160 99 99 36864 Jan 31 12:58 projects
drwxr-xr-x. 3 root root 20 Jul 2 2024 public
drwxrwxr-x. 1 root root 40 Jul 2 2024 rsyslog
drwxr-xr-x. 3 root root 17 Jul 2 2024 venv
% oc rsh -n awx-production deployments/awx-prd-task cat /etc/passwd
awx:x:1000940000:0:,,,:/var/lib/awx:/bin/bash
% oc rsh -n awx-production deployments/awx-prd-task cat /etc/group
root:x:0:
bin:x:1:
daemon:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mem:x:8:
kmem:x:9:
wheel:x:10:
cdrom:x:11:
mail:x:12:
man:x:15:
dialout:x:18:
floppy:x:19:
games:x:20:
tape:x:33:
video:x:39:
ftp:x:50:
lock:x:54:
audio:x:63:
users:x:100:
nobody:x:65534:
tss:x:59:
nginx:x:999:
utmp:x:22:
utempter:x:35:
ssh_keys:x:101:
input:x:998:
kvm:x:36:
render:x:997:
systemd-journal:x:190:
systemd-coredump:x:996:
dbus:x:81:

acsezen avatar Feb 19 '25 21:02 acsezen

are you mounting in your own persistent projects directory by chance?

| OSError: [Errno 39] Directory not empty: 'ansible-windows-1.11.1-cgel_146

when this error occurs, can you shell into the running task container and do a few ls -l commands on the directory it is trying to remove and see what files it cannot delete?

| FileNotFoundError: [Errno 2] No such file or directory: 'tmpvjbzpmia'

this one is weird, as the line right before is elif os.path.exists(stage_path):

so it should exist on disk. Is it possible there is some other process removing these files at the same time?

fosterseth avatar Feb 26 '25 16:02 fosterseth

I was also seeing thin in awx version 24.6.1 running k3s version 1.28.15. Specifically only the Errno 39 issue.

I had projects_persistence set to true and resolved it by reverting it to the default value of false. I believe this is only a workaround.

The files I were seeing were consistently named like this .nfs000000000008a76c00005668 which is odd as I am not running any nfs.

I hope this helps.

cuyler-berg avatar Apr 09 '25 21:04 cuyler-berg

closing due to inactivity, if folks continue to see this issue, please re-open with current context or updates.

thedoubl3j avatar Jul 17 '25 15:07 thedoubl3j