apptainer icon indicating copy to clipboard operation
apptainer copied to clipboard

Checkpoint feature tutorial does not work when ran with non-root user

Open luispcunha opened this issue 1 year ago • 3 comments

Version of Apptainer

$ apptainer --version
apptainer version 1.3.3

Expected behavior

Expected to be able to reproduce the checkpointing example in the documentation, running all Apptainer commands with a non-privileged user.

Actual behavior

After executing the apptainer checkpoint instance server, the web server running in the instance crashes. Logs from the ~/.apptainer/instances/logs/{host_name}/{usename}/server.err file:

127.0.0.1 - - [02/Sep/2024 10:28:27] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [02/Sep/2024 10:28:32] "GET / HTTP/1.1" 200 -

[2024-09-02T10:28:39.795, 41000, 41003, ERROR] at fileconnlist.cpp:428 in prepareShmList; REASON='JASSERT(fd != -1) failed'
     (strerror((*__errno_location ()))) = Read-only file system
     area.name = /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache
    python3.10: Terminating...
    Backtrace:
        1 jassert_internal::JAssert::~JAssert() in /.singularity.d/libs/libdmtcp.so 0x7f2515e572f1
        2 dmtcp::FileConnList::prepareShmList() in /.singularity.d/libs/libdmtcp_ipc.so 0x7f25162c52de
        3 dmtcp_FileConnList_EventHook(eDmtcpEvent, _DmtcpEventData_t*) in /.singularity.d/libs/libdmtcp_ipc.so 0x7f25162c68f7
        4 dmtcp::PluginManager::eventHook(eDmtcpEvent, _DmtcpEventData_t*) in /.singularity.d/libs/libdmtcp.so 0x7f2515e26e57
        5 dmtcp::DmtcpWorker::preCheckpoint() in /.singularity.d/libs/libdmtcp.so 0x7f2515e1dff4
        6  in /.singularity.d/libs/libdmtcp.so 0x7f2515e2eab4
        7  in /.singularity.d/libs/libdmtcp.so 0x7f2515e30c66
        8  in /lib/x86_64-linux-gnu/libpthread.so.0 0x7f2515852fa3
        9 clone in /lib/x86_64-linux-gnu/libc.so.6 0x7f25155f506f

Following calls to apptainer checkpoint instance server show the following logs:

INFO:    Using checkpoint "example-checkpoint"
Error, computation not in running state.  Either a checkpoint is
 currently happening or there are no connected processes.

If using the "root" user to run the example, this error doesn't occur, and I'm able to reproduce the example but the restarting part doesn't work reliably (similar to the issue described here).

Steps to reproduce this behavior

Follow the instructions in the documentation. The user running shouldn't be the root user. DMTCP was installed from source from the tag 3.0.0 in the github repo.

What OS/distro are you running

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

How did you install Apptainer

wget https://github.com/apptainer/apptainer/releases/download/v1.3.3/apptainer_1.3.3_amd64.deb
sudo apt install -y ./apptainer_1.3.3_amd64.deb

luispcunha avatar Sep 02 '24 10:09 luispcunha

Hello, looking to reproduce this. Did you build dmtcp with the --enable-static-libstdcxx flag?

ikaneshiro avatar Sep 03 '24 14:09 ikaneshiro

Hello, looking to reproduce this. Did you build dmtcp with the --enable-static-libstdcxx flag?

Yes. This is how I built it:

#!/bin/bash

VERSION=3.0.0

apt install git gcc g++ make -y
apt install python3 -y

git clone https://github.com/dmtcp/dmtcp
cd dmtcp
git checkout $VERSION

./configure --enable-static-libstdcxx

make
make check # Optional
make install

echo /usr/local/lib/dmtcp > /etc/ld.so.conf.d/dmtcp.conf
ldconfig

luispcunha avatar Sep 03 '24 15:09 luispcunha

hmm, I can not reproduce this issue, it looks like that it is related to permission issue as shown in the dump trace

[2024-09-02T10:28:39.795, 41000, 41003, ERROR] at fileconnlist.cpp:428 in prepareShmList; REASON='JASSERT(fd != -1) failed'
     (strerror((*__errno_location ()))) = Read-only file system

JasonYangShadow avatar Sep 04 '24 07:09 JasonYangShadow

The documentation was updated in apptainer/apptainer-userdocs#300.

DrDaveD avatar Nov 19 '24 16:11 DrDaveD