Slurm-web icon indicating copy to clipboard operation
Slurm-web copied to clipboard

munged loop causes system crash due to open files

Open NathanielMiddleton opened this issue 5 years ago • 3 comments

I have been trying to get slurm-web to work in a docker container on Centos 7.6. After about 16-18 hours of the container being up... the whole server comes crashing down as it can not spawn processes after that point. The source of this appears to be munged spawning every second... looping the following constantly in munged.log: 2020-03-30 21:27:46 +0000 Notice: Running on "c5f156dc059c" (172.17.0.2) 2020-03-30 21:27:46 +0000 Info: PRNG seeded with 1024 bytes from "/dev/urandom" 2020-03-30 21:27:46 +0000 Info: Updating supplementary group mapping every 3600 seconds 2020-03-30 21:27:46 +0000 Info: Enabled supplementary group mtime check of "/etc/group" 2020-03-30 21:27:46 +0000 Info: Removed existing socket "/var/run/munge/munge.socket.2" 2020-03-30 21:27:46 +0000 Notice: Starting munge-0.5.11 daemon (pid 3107) 2020-03-30 21:27:46 +0000 Info: Created 2 work threads 2020-03-30 21:27:46 +0000 Info: Found 1 user with supplementary groups in 0.001 seconds 2020-03-30 21:27:47 +0000 Notice: Running on "c5f156dc059c" (172.17.0.2) 2020-03-30 21:27:47 +0000 Info: PRNG seeded with 1024 bytes from "/dev/urandom" 2020-03-30 21:27:47 +0000 Info: Updating supplementary group mapping every 3600 seconds 2020-03-30 21:27:47 +0000 Info: Enabled supplementary group mtime check of "/etc/group" 2020-03-30 21:27:47 +0000 Info: Removed existing socket "/var/run/munge/munge.socket.2" 2020-03-30 21:27:47 +0000 Notice: Starting munge-0.5.11 daemon (pid 3120) 2020-03-30 21:27:47 +0000 Info: Created 2 work threads 2020-03-30 21:27:47 +0000 Info: Found 1 user with supplementary groups in 0.001 seconds

Any ideas on what is happening here?

NathanielMiddleton avatar Mar 30 '20 22:03 NathanielMiddleton

Same error here... don't understand why...

nothing-fr avatar Jan 14 '22 14:01 nothing-fr

Made some silly fix. Like a fast workaround

cat /etc/service/munge/run
#!/bin/bash
set -e

mkdir -p /var/run/munge
chown munge: /var/{log,lib,run}/munge
if [[ $(ps aux| grep "/usr/sbin/munged -f"| wc -l) -le 2 ]]; then
        exec /sbin/setuser munge /usr/sbin/munged -f
fi

BlackS52 avatar Oct 26 '22 15:10 BlackS52

This issue concerns Slurm-web v2 which is not maintained anymore. You are highly encouraged to test the new version v3.0.0 for which the quick start guide is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html

Note that Slurm-web v3.0.0 is officially supported on CentOS 8 with RPM packages. For older versions, we plan to distribute containers and this effort is tracked in https://github.com/rackslab/Slurm-web/issues/266.

Unless someone is motivated to maintain the old version of Slurm-web or you have a justified reason to keep this issue open, it will be closed in a few weeks.

rezib avatar May 15 '24 13:05 rezib

For the reasons explained in the previous comment, I finally close this issue.

rezib avatar Jun 19 '24 09:06 rezib