criu icon indicating copy to clipboard operation
criu copied to clipboard

Error when restoring a checkpoint when move it to another VM

Open FMyb opened this issue 2 years ago • 12 comments

Description

I try to create checkpoint from jupyter/scipy-notebook on VM, move it to another VM and restore it. After this actions, restore failed with exception Error response from daemon: OCI runtime restore failed: criu failed: type NOTIFY errno 0.

Steps to reproduce the issue:

  1. Start a jupyter lab docker image docker run --name jupyter -p 8888:8888 jupyter/scipy-notebook, go to jupyter lab and execute cells there.
  2. Create checkpoint and move it to another VM.
  3. Restore checkpoint on another VM

Describe the results you received: Error response from daemon: OCI runtime restore failed: criu failed: type NOTIFY errno 0 log file: restore.log: unknown

Describe the results you expected: Restored image with saved state

Additional information you deem important (e.g. issue happens only occasionally): If don't execute cells in jupyter lab, issue is not reproducible.

CRIU logs and information:

CRIU full dump/restore logs:

restore log

Output of `criu --version`:

Version: 3.17.1
Output of `criu check --all`:

Looks good.

Additional environment details: Linux fmyar-3 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

FMyb avatar Jan 13 '23 18:01 FMyb

You have to ensure the container file system stays the same. The destination container is missing files.

You could try to migrate containers with Podman. Podman automatically handles file system changes.

adrianreber avatar Jan 13 '23 18:01 adrianreber

@FMyb As Adrian mentioned above, docker does not support checkpoint/restore of file-system changes and this is the reason for the error you are seeing. The following example shows how you can use Podman instead.

# Start container
sudo podman run -d --name jupyter -p 8888:8888  jupyter/scipy-notebook
# Create a checkpoint
sudo podman container checkpoint jupyter --export=jupyter.tar.gz
# Remove old container
sudo podman rm jupyter
# Restore from checkpoint
sudo podman container restore --import=jupyter.tar.gz

rst0git avatar Jan 13 '23 20:01 rst0git

@rst0git Thank you a lot, but I tried to execute these commands and it doesn't: an error when creating a checkpoint

 $ sudo podman run -d --name jupyter -p 8888:8888  jupyter/scipy-notebook
f7b9494384116c6bfd814f8b84d099de28f73cb27b0dd3d3f393013b8d50f3bc
 $ sudo podman container checkpoint jupyter --export=jupyter.tar.gz
Error: configured runtime does not support checkpoint/restore

FMyb avatar Jan 19 '23 18:01 FMyb

Which runtime do you have? runc or crun?

adrianreber avatar Jan 19 '23 18:01 adrianreber

Which runtime do you have? runc or crun?

I think I use default runtime, and in /etc/containers/libpod.conf:

# Default OCI runtime
runtime = "crun"

FMyb avatar Jan 19 '23 18:01 FMyb

It seems like crun on your platform is built without criu support. You need to use crun with criu enabled or try switching to runc.

adrianreber avatar Jan 19 '23 18:01 adrianreber

I know that crun on Fedora has criu support. Not sure why it is not enabled on Ubuntu.

adrianreber avatar Jan 19 '23 18:01 adrianreber

Ok, thank you a lot, now checkpoint create is working. But I have a problem with restore, I create a checkpoint on VM and move it to another VM. On it I execute sudo podman container restore -k --import /home/fmyar/save1.tar.gz --name jupyter --log-level=debug and it's failed with Error: container creation timeout: internal libpod error podman log. And restore.log contains (00.086582) pie: 58: Error (criu/pie/restorer.c:180): can't write lsm profile -2 (00.086585) pie: 58: Error (criu/pie/restorer.c:641): BUG at criu/pie/restorer.c:641.

FMyb avatar Jan 19 '23 21:01 FMyb

I know that Ubuntu usually uses apparmor. Is the apparmor configuration on both system the same? Both enabled or disabled.

Personally I never used apparmor much. More selinux, but the error message is from the point where criu tries to write the previous apparmor label.

I have also only used Podman on selinux systems. So not sure how well tested Podman, apparmor and CRIU in combination is.

Try to ensure that the apparmor configuration on both systems is as similar as possible.

adrianreber avatar Jan 19 '23 21:01 adrianreber

I tried to check it, on the both system apparmor is enabled but they are different. There is another profile on the VM where I'm creating the checkpoint -- containers-default-0.44.4 which is in the podman log (apparmor_status VM1), on the second VM where I try restore this profile doesn't exists apparmor_status VM2. This profile appears when starting a container with jupyterlab. Update: I did apparmor profiles is similar and try to restore checkpoint. Restore finished success, but when I go to jupyter lab and try to open ipynb file, it's failed with error:

Uncaught exception GET /api/contents/work/Untitled.ipynb?type=notebook&content=1&1674407327885 (37.232.173.123)
    HTTPServerRequest(protocol='http', host='84.252.129.211:8888', method='GET', uri='/api/contents/work/Untitled.ipynb?type=notebook&content=1&1674407327885', version='HTTP/1.1', remote_ip='37.232.173.123')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.10/site-packages/tornado/web.py", line 1713, in _execute
        result = await result
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/handlers.py", line 121, in get
        model = await ensure_async(
      File "/opt/conda/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 184, in ensure_async
        result = await obj
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 768, in get
        model = await self._notebook_model(path, content=content)
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 724, in _notebook_model
        self.mark_trusted_cells(nb, path)
      File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 721, in mark_trusted_cells
        trusted = self.notary.check_signature(nb)
      File "/opt/conda/lib/python3.10/site-packages/nbformat/sign.py", line 454, in check_signature
        return self.store.check_signature(signature, self.algorithm)
      File "/opt/conda/lib/python3.10/site-packages/nbformat/sign.py", line 237, in check_signature
        self.db.execute(
    sqlite3.OperationalError: attempt to write a readonly database

FMyb avatar Jan 20 '23 18:01 FMyb

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Feb 22 '23 00:02 github-actions[bot]

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Mar 25 '23 00:03 github-actions[bot]