ERROR [-] awx.main.tasks.jobs project_update xxx (running) Post run hook errored. (OSError: [Errno 39]) | (FileNotFoundError: [Errno 2])
Please confirm the following
- [x] I agree to follow this project's code of conduct.
- [x] I have checked the current issues for duplicates.
- [x] I understand that AWX is open source software provided for free and that I might not receive a timely response.
- [x] I am NOT reporting a (potential) security vulnerability. (These should be emailed to
[email protected]instead.)
Bug Summary
Hello team,
Occasionally, I encounter a Post run hook errored for project updates, even though most projects update successfully. Here are the tracebacks;
2025-02-19 03:01:00,357 ERROR [-] awx.main.tasks.jobs project_update xxx (running) Post run hook errored.
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 637, in run
self.post_run_hook(self.instance, status)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 1410, in post_run_hook
shutil.rmtree(stage_path) # cannot trust content update produced
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/shutil.py", line 752, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 683, in _rmtree_safe_fd
onerror(os.rmdir, fullname, sys.exc_info())
File "/usr/lib64/python3.11/shutil.py", line 681, in _rmtree_safe_fd
os.rmdir(entry.name, dir_fd=topfd)
OSError: [Errno 39] Directory not empty: 'ansible-windows-1.11.1-cgel_146
or
2025-02-19 09:00:41,404 ERROR [ad623158921548589ea54af63b05b196] awx.main.tasks.jobs project_update xxx (running) Post run hook errored.
Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 637, in run
self.post_run_hook(self.instance, status)
File "/var/lib/awx/venv/awx/lib64/python3.11/site-packages/awx/main/tasks/jobs.py", line 1410, in post_run_hook
shutil.rmtree(stage_path) # cannot trust content update produced
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/shutil.py", line 752, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 672, in _rmtree_safe_fd
_rmtree_safe_fd(dirfd, fullname, onerror)
File "/usr/lib64/python3.11/shutil.py", line 703, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/usr/lib64/python3.11/shutil.py", line 701, in _rmtree_safe_fd
os.unlink(entry.name, dir_fd=topfd)
FileNotFoundError: [Errno 2] No such file or directory: 'tmpvjbzpmia'
AWX version
24.6.1
Select the relevant components
- [ ] UI
- [ ] UI (tech preview)
- [x] API
- [ ] Docs
- [ ] Collection
- [ ] CLI
- [ ] Other
Installation method
openshift
Modifications
no
Ansible version
2.15.2
Operating system
Kustomize Version: v5.0.1 Openshift Version: 4.14.34 Kubernetes Version: v1.27.15+6147456
Web browser
No response
Steps to reproduce
It happens randomly. Sometimes updates are successful.
Expected results
Project_update always run successfully.
Actual results
Project_update fails randomly.
Additional information
% oc rsh -n awx-production deployments/awx-prd-task ls -al /var/lib/awx
total 44
drwxrwxr-x. 1 root root 43 Feb 19 13:00 .
drwxr-xr-x. 1 root root 33 Feb 19 00:49 ..
-rw-------. 1 awx root 177 Feb 19 14:10 .bash_history
drwxr-xr-x. 3 root root 19 Jul 2 2024 .local
drwxrwxrwx. 160 99 99 36864 Jan 31 12:58 projects
drwxr-xr-x. 3 root root 20 Jul 2 2024 public
drwxrwxr-x. 1 root root 40 Jul 2 2024 rsyslog
drwxr-xr-x. 3 root root 17 Jul 2 2024 venv
oc rsh -n awx-production deployments/awx-prd-task ls -al /var/lib/awx/projects/ | head -n10
total 688
drwxrwxrwx. 160 99 99 36864 Jan 31 12:58 .
drwxrwxr-x. 1 root root 43 Feb 19 13:00 ..
drwxr-xr-x. 5 awx 99 4096 Nov 14 09:33 _517__xxx_xxx_xxx
-rwxr-xr-x. 1 awx 99 0 Nov 14 09:33 _517__xxx_xxx_xxx.lock
% oc rsh -n awx-production deployments/awx-prd-web ls -al /var/lib/awx
total 40
drwxrwxr-x. 1 root root 37 Feb 18 23:52 .
drwxr-xr-x. 1 root root 30 Jul 2 2024 ..
prw-------. 1 awx root 0 Feb 18 23:52 awxfifo
drwxr-xr-x. 3 root root 19 Jul 2 2024 .local
drwxrwxrwx. 160 99 99 36864 Jan 31 12:58 projects
drwxr-xr-x. 3 root root 20 Jul 2 2024 public
drwxrwxr-x. 1 root root 40 Jul 2 2024 rsyslog
drwxr-xr-x. 3 root root 17 Jul 2 2024 venv
% oc rsh -n awx-production deployments/awx-prd-task cat /etc/passwd
awx:x:1000940000:0:,,,:/var/lib/awx:/bin/bash
% oc rsh -n awx-production deployments/awx-prd-task cat /etc/group
root:x:0:
bin:x:1:
daemon:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mem:x:8:
kmem:x:9:
wheel:x:10:
cdrom:x:11:
mail:x:12:
man:x:15:
dialout:x:18:
floppy:x:19:
games:x:20:
tape:x:33:
video:x:39:
ftp:x:50:
lock:x:54:
audio:x:63:
users:x:100:
nobody:x:65534:
tss:x:59:
nginx:x:999:
utmp:x:22:
utempter:x:35:
ssh_keys:x:101:
input:x:998:
kvm:x:36:
render:x:997:
systemd-journal:x:190:
systemd-coredump:x:996:
dbus:x:81:
are you mounting in your own persistent projects directory by chance?
| OSError: [Errno 39] Directory not empty: 'ansible-windows-1.11.1-cgel_146
when this error occurs, can you shell into the running task container and do a few ls -l commands on the directory it is trying to remove and see what files it cannot delete?
| FileNotFoundError: [Errno 2] No such file or directory: 'tmpvjbzpmia'
this one is weird, as the line right before is elif os.path.exists(stage_path):
so it should exist on disk. Is it possible there is some other process removing these files at the same time?
I was also seeing thin in awx version 24.6.1 running k3s version 1.28.15. Specifically only the Errno 39 issue.
I had projects_persistence set to true and resolved it by reverting it to the default value of false. I believe this is only a workaround.
The files I were seeing were consistently named like this .nfs000000000008a76c00005668 which is odd as I am not running any nfs.
I hope this helps.
closing due to inactivity, if folks continue to see this issue, please re-open with current context or updates.