Slurm-web icon indicating copy to clipboard operation
Slurm-web copied to clipboard

Errors with munge and "bad user name www-data"

Open prehensilecode opened this issue 4 years ago • 7 comments

Environment:

  • RHEL8
  • Slurm 20.02.7 via Bright Cluster Manager
  • podman instead of docker

The container builds just fine (as root) with the provided dockerfile:

podman build -f Dockerfile

Modified the run.sh script:

data=/data/slurm-web
podman run -d -v $data/conf:/etc/slurm-web \
              -v /etc/munge:/etc/munge \
              -v /cm/shared/apps/slurm/var/etc/mycluster:/etc/slurm-llnl \
              -v /etc/passwd:/etc/passwd \
              -v /etc/group:/etc/group \
              -p 8899:80 \
              slurm-web

Container runs:

$ podman ps
CONTAINER ID  IMAGE                       COMMAND        CREATED        STATUS            PORTS                 NAMES
db751a9e001b  localhost/slurm-web:latest  /sbin/my_init  8 minutes ago  Up 8 minutes ago  0.0.0.0:8899->80/tcp  sweet_haslett

but there are error messages:

$ podman logs sweet_haslett
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/rc.local...
*** Booting runit daemon...
*** Runit started as PID 9
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
Aug 12 17:32:39 db751a9e001b syslog-ng[20]: syslog-ng starting up; version='3.5.6'
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
chown: invalid spec: ‘munge:’
...

The lines about bad user name www-data and chown: invalid spec: 'munge:' appear repeatedly.

Doing netstat -an shows that it is listening on port 8899 as expected.

Attempt to connect to http://localhost:8899/slurm using browser (elinks, Chrome, Firefox). Connection seems to be made, but no content is shown.

prehensilecode avatar Aug 12 '21 17:08 prehensilecode

Hi @prehensilecode,

Actually, www-data is the system user of Apache daemon on Debian/Ubuntu. It is normally created by apache packages postinstallation scripts. Did installation of apache packages go well in the container at build time?

rezib avatar Aug 13 '21 15:08 rezib

Trying to re-build the container from scratch to capture the logs. But, I just got this error:

fatal: unable to access 'https://github.com/edf-hpc/slurm-web.git/': Failed to connect to github.com port 443: Connection timed out

prehensilecode avatar Aug 13 '21 18:08 prehensilecode

The connection error to github.com fixed itself after waiting a few minutes. Attached is the podman build log. podman_build_log.txt

Also fixed the run script:

data=/ifs/sysadmin/Src/slurm-web

podman run -d -v $data/conf:/etc/slurm-web \
              -v /etc/munge:/etc/munge \
              -v /cm/shared/apps/slurm/var/etc/picotte:/etc/slurm-llnl \
              -v /etc/passwd:/etc/passwd \
              -v /etc/group:/etc/group \
              -p 8899:80 \
              --name=slurm-web \
              __container_id__

Logs still show the same messages:

*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh...
*** Running /etc/rc.local...
*** Booting runit daemon...
*** Runit started as PID 9
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
Aug 13 18:30:14 793fe25c8f0f syslog-ng[19]: syslog-ng starting up; version='3.5.6'
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
chown: invalid spec: ‘munge:’
AH00543: apache2: bad user name www-data
...

prehensilecode avatar Aug 13 '21 18:08 prehensilecode

The container build log does not show anything weird during package installation, the www-data user has certainly been created properly. The issue is probably during container initialization. I guess podman does some UID binding between the host and the container (by generating its own /etc/passwd or similar) and the exact behaviour might be tuned with podman settings.

Please note the container is created with very old slurm-web and pyslurm releases designed to work with Slurm 15.08. You have to tune the Dockerfile to work with Slurm 20.02.

rezib avatar Aug 23 '21 09:08 rezib

create user info

useradd www-data
groupadd www-data -g www-data

then try again.

pokitpeng avatar Nov 09 '21 10:11 pokitpeng

I had the same issue on CENTOS 8-stream. I replaced podman by docker-ce, and after that, I was able to start the container.

But... the result is not was it should be. See #221

forgetfr avatar Feb 10 '22 19:02 forgetfr

This issue concerns Slurm-web v2 which is not maintained anymore. You are highly encouraged to test the new version v3.0.0 for which the quick start guide is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html

Note that Slurm-web v3.0.0 is officially supported on CentOS 8 with RPM packages. If you prefer podman containers, we plan to work on this in https://github.com/rackslab/Slurm-web/issues/266.

Unless someone is motivated to maintain the old version of Slurm-web or you have a justified reason to keep this issue open, it will be closed in a few weeks.

rezib avatar May 15 '24 13:05 rezib

For the reasons explained in the previous comment, I finally close this issue.

rezib avatar Jun 19 '24 09:06 rezib