yum
yum copied to clipboard
yum gets deadlocked/hung up (indefinitely) waiting for urlgrabber-ext-down
While I can appreciate that YUM is now deprecated, it's still the main package manager for EL7, which is where I am running into an issue with it just hanging indefinitely, until it is killed.
The process tree looks like this:
8702 ? S 0:05 | \_ /usr/bin/python /usr/bin/yum -y --disablerepo=* --enablerepo=repo.dc.hpdd.intel.com_repository_*,build.hpdd.intel.com_job_daos-stack* install --exclude openmpi daos-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-client-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-tests-1.1.2.1-1.5456.g02ce0510.el7.x86_64 daos-server-1.1.2.1-1.5456.g02ce0510.el7.x86_64 openmpi3 hwloc ndctl fio patchutils ior-hpc-daos-0 romio-tests-cart-4-daos-0 testmpio-cart-4-daos-0 mpi4py-tests-cart-4-daos-0 hdf5-mpich2-tests-daos-0 hdf5-openmpi3-tests-daos-0 hdf5-vol-daos-mpich2-tests-daos-0 hdf5-vol-daos-openmpi3-tests-daos-0 MACSio-mpich2-daos-0 MACSio-openmpi3-daos-0 mpifileutils-mpich-daos-0
8705 ? S 0:00 | \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down
8711 ? S 0:00 | \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down
8712 ? S 0:00 | \_ /usr/bin/python /usr/libexec/urlgrabber-ext-down
The status of the processes are:
# /tmp/strace -f -p 8702
/tmp/strace: Process 8702 attached
wait4(8711, ^C/tmp/strace: Process 8702 detached
<detached ...>
# /tmp/strace -f -p 8705
/tmp/strace: Process 8705 attached
read(0, ^C/tmp/strace: Process 8705 detached
<detached ...>
# /tmp/strace -f -p 8711
/tmp/strace: Process 8711 attached
futex(0x1444c90, FUTEX_WAIT_PRIVATE, 2, NULL^C/tmp/strace: Process 8711 detached
<detached ...>
# /tmp/strace -f -p 8712
/tmp/strace: Process 8712 attached
futex(0x2174c90, FUTEX_WAIT_PRIVATE, 2, NULL^C/tmp/strace: Process 8712 detached
<detached ...>
which to me looks like 8702, 8711 and 8705 are deadlocked all waiting/blocked on each other.
Just as a heads-up, the read(0, indicates process 8705 is blocking on reading standard input.
@lukash Yes, I do realize that, but why? stdin is likely a pipe to the parent process, which is simply waiting on children.
I don't know. You haven't really provided a reproducer, I thought you may want to investigate yourself. This seems like a rare corner case, since you're only hitting it yourself long after the development has stopped. For the same reason it is likely going to be low priority for us unless the impact turns out to be bigger (even with a reproducer).
We're hitting the same issue with one of our ansible playbooks. It definitely does seem to be an edge case because this will run 99 times without issues, but we are seeing this issue periodically.
I'm seeing the same futex waits and reads as reported by Brian.
root 3743 3726 3715 3715 0 15:57 ? 00:00:03 /usr/bin/python /bin/yum -d 2 -y install container-selinux docker-ce-18.09.7-3.el7
root 3744 3743 3715 3715 0 15:57 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 3745 3743 3715 3715 0 15:57 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 3746 3743 3715 3715 0 15:57 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down
root 3747 3743 3715 3715 0 15:57 ? 00:00:01 /usr/bin/python /usr/libexec/urlgrabber-ext-down
[root@<HOST> <USER>]# strace -p 3747
strace: Process 3747 attached
read(0,
^Cstrace: Process 3747 detached
<detached ...>
[root@<HOST> <USER>]# strace -p 3746
strace: Process 3746 attached
futex(0x26fbb90, FUTEX_WAIT_PRIVATE, 2, NULL
^Cstrace: Process 3746 detached
<detached ...>
[root@<HOST> <USER>]# strace -p 3745
strace: Process 3745 attached
futex(0x16acb70, FUTEX_WAIT_PRIVATE, 2, NULL
^Cstrace: Process 3745 detached
<detached ...>
[root@<HOST> <USER>]# strace -p 3744
strace: Process 3744 attached
read(0,
^Cstrace: Process 3744 detached
<detached ...>
Are you setting minrate/timeout?
When we try to install ROCm on CentOS 7.9.2009 Docker, the same problem persists. It happens about once every 20 times.
master@:~> ps -eaf | grep 26373 master 20906 18586 0 12:33 pts/2 00:00:00 grep --color=auto 26373 root 26373 518 0 03:40 ? 00:00:00 /usr/bin/python /usr/bin/yum -y install rocm-openmp-sdk5.3.2 root 26388 26373 0 03:40 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down root 26389 26373 0 03:40 ? 00:00:00 /usr/bin/python /usr/libexec/urlgrabber-ext-down master@:~> sudo strace -p 26388 strace: Process 26388 attached futex(0x2233bb0, FUTEX_WAIT_PRIVATE, 2, NULL^Cstrace: Process 26388 detached <detached ...>
master@:~> sudo strace -p 26389 strace: Process 26389 attached read(0, ^Cstrace: Process 26389 detached <detached ...>
master@:~> master@:~> sudo strace -p 26373 strace: Process 26373 attached wait4(18278, ^Cstrace: Process 26373 detached <detached ...>
master@:~>
Do we have any solution or workaround for this problem?