criu
criu copied to clipboard
Error when restoring a checkpoint when move it to another VM
Description
I try to create checkpoint from jupyter/scipy-notebook on VM, move it to another VM and restore it. After this actions, restore failed with exception Error response from daemon: OCI runtime restore failed: criu failed: type NOTIFY errno 0.
Steps to reproduce the issue:
- Start a jupyter lab docker image
docker run --name jupyter -p 8888:8888 jupyter/scipy-notebook, go to jupyter lab and execute cells there. - Create checkpoint and move it to another VM.
- Restore checkpoint on another VM
Describe the results you received:
Error response from daemon: OCI runtime restore failed: criu failed: type NOTIFY errno 0 log file: restore.log: unknown
Describe the results you expected: Restored image with saved state
Additional information you deem important (e.g. issue happens only occasionally): If don't execute cells in jupyter lab, issue is not reproducible.
CRIU logs and information:
Output of `criu --version`:
Version: 3.17.1
Output of `criu check --all`:
Looks good.
Additional environment details:
Linux fmyar-3 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
You have to ensure the container file system stays the same. The destination container is missing files.
You could try to migrate containers with Podman. Podman automatically handles file system changes.
@FMyb As Adrian mentioned above, docker does not support checkpoint/restore of file-system changes and this is the reason for the error you are seeing. The following example shows how you can use Podman instead.
# Start container
sudo podman run -d --name jupyter -p 8888:8888 jupyter/scipy-notebook
# Create a checkpoint
sudo podman container checkpoint jupyter --export=jupyter.tar.gz
# Remove old container
sudo podman rm jupyter
# Restore from checkpoint
sudo podman container restore --import=jupyter.tar.gz
@rst0git Thank you a lot, but I tried to execute these commands and it doesn't: an error when creating a checkpoint
$ sudo podman run -d --name jupyter -p 8888:8888 jupyter/scipy-notebook
f7b9494384116c6bfd814f8b84d099de28f73cb27b0dd3d3f393013b8d50f3bc
$ sudo podman container checkpoint jupyter --export=jupyter.tar.gz
Error: configured runtime does not support checkpoint/restore
Which runtime do you have? runc or crun?
Which runtime do you have? runc or crun?
I think I use default runtime, and in /etc/containers/libpod.conf:
# Default OCI runtime
runtime = "crun"
It seems like crun on your platform is built without criu support. You need to use crun with criu enabled or try switching to runc.
I know that crun on Fedora has criu support. Not sure why it is not enabled on Ubuntu.
Ok, thank you a lot, now checkpoint create is working. But I have a problem with restore, I create a checkpoint on VM and move it to another VM. On it I execute sudo podman container restore -k --import /home/fmyar/save1.tar.gz --name jupyter --log-level=debug and it's failed with Error: container creation timeout: internal libpod error podman log.
And restore.log contains (00.086582) pie: 58: Error (criu/pie/restorer.c:180): can't write lsm profile -2 (00.086585) pie: 58: Error (criu/pie/restorer.c:641): BUG at criu/pie/restorer.c:641.
I know that Ubuntu usually uses apparmor. Is the apparmor configuration on both system the same? Both enabled or disabled.
Personally I never used apparmor much. More selinux, but the error message is from the point where criu tries to write the previous apparmor label.
I have also only used Podman on selinux systems. So not sure how well tested Podman, apparmor and CRIU in combination is.
Try to ensure that the apparmor configuration on both systems is as similar as possible.
I tried to check it, on the both system apparmor is enabled but they are different. There is another profile on the VM where I'm creating the checkpoint -- containers-default-0.44.4 which is in the podman log (apparmor_status VM1), on the second VM where I try restore this profile doesn't exists apparmor_status VM2. This profile appears when starting a container with jupyterlab. Update: I did apparmor profiles is similar and try to restore checkpoint. Restore finished success, but when I go to jupyter lab and try to open ipynb file, it's failed with error:
Uncaught exception GET /api/contents/work/Untitled.ipynb?type=notebook&content=1&1674407327885 (37.232.173.123)
HTTPServerRequest(protocol='http', host='84.252.129.211:8888', method='GET', uri='/api/contents/work/Untitled.ipynb?type=notebook&content=1&1674407327885', version='HTTP/1.1', remote_ip='37.232.173.123')
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/tornado/web.py", line 1713, in _execute
result = await result
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/handlers.py", line 121, in get
model = await ensure_async(
File "/opt/conda/lib/python3.10/site-packages/jupyter_core/utils/__init__.py", line 184, in ensure_async
result = await obj
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 768, in get
model = await self._notebook_model(path, content=content)
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/filemanager.py", line 724, in _notebook_model
self.mark_trusted_cells(nb, path)
File "/opt/conda/lib/python3.10/site-packages/jupyter_server/services/contents/manager.py", line 721, in mark_trusted_cells
trusted = self.notary.check_signature(nb)
File "/opt/conda/lib/python3.10/site-packages/nbformat/sign.py", line 454, in check_signature
return self.store.check_signature(signature, self.algorithm)
File "/opt/conda/lib/python3.10/site-packages/nbformat/sign.py", line 237, in check_signature
self.db.execute(
sqlite3.OperationalError: attempt to write a readonly database
A friendly reminder that this issue had no activity for 30 days.
A friendly reminder that this issue had no activity for 30 days.