InfiniteGlass icon indicating copy to clipboard operation
InfiniteGlass copied to clipboard

Investigate application checkpointing

Open redhog opened this issue 5 years ago • 23 comments

  • https://criu.org
  • http://dmtcp.sourceforge.net/
  • https://sourceforge.net/projects/cryopid2/

Main issue with this is X11 connections and resources. Is there a "rootless" version of Xnest that can be used for this?

  • https://www.xpra.org/
  • https://en.wikipedia.org/wiki/Xmove

redhog avatar Feb 08 '20 22:02 redhog

Bugfix for xpra on ubuntu: https://superuser.com/a/1517896

redhog avatar Feb 20 '20 10:02 redhog

My understanding from our multiple conversations is that saving and restoring are performed using SMLib, which is the proper approach in X but requires applications to support it.

Because not every applications support SMLib, it makes it difficult to save and restore States for any application.

I have been programming Smalltalk for many years and it always had an image to save and restore. It contains every object, live and otherwise.

Looking for something like that for Linux, I stumbled upon CRIU and it seems like it could be a very good solution for InfiniteGlass.

https://www.youtube.com/playlist?list=PL86FC0XuGZPISge_th8F5Jjj-IbGXEfE6

redhog avatar Feb 23 '20 10:02 redhog

I had some issues with xpra crashing dmtcp, but maybe it works better under criu?

redhog avatar Feb 23 '20 10:02 redhog

https://shifter-users.devloop.org.narkive.com/kOLeprGr/winswitch-xpra-docker-assistance-request

redhog avatar Feb 23 '20 10:02 redhog

@BackOrder if you have time to experiment with criu+xpra it would be awesome!this I'll try getting it to run too! Once we have a simple example, I'll write a wrapper script, just like glass-session-wrapper to handle it...

redhog avatar Feb 23 '20 10:02 redhog

I can take a look. CRIU seems simple enough. What's the difference between Xpra and using ssh -X?

Python library for CRIU: https://criu.org/Py_API

By the way, don't be shy to explain how you solve various issues we have been talking about. I'm looking at the commits and study from them.

IanTrudel avatar Feb 23 '20 12:02 IanTrudel

So Xpra is like screen, but for X. A bit like a Xvnc. You start an app under it, and then you can attach it to an X server, detach, attach again to some other X server... Possibly over ssh -X if need be.

redhog avatar Feb 23 '20 12:02 redhog

I was thinking that to stop and restart an X app, you need to checkpoint it, but you can't checkpoint the X connection in any meaningful way (or any other external socket connection). However, you can checkpoint a group of processes together, even when they have socket connections between each other. When xpra is in its detached mode, the app and xpra only have sockets connecting to each other, not an outside X server... So you start an app under xpra. To save it, you detach xpra from the X server and then checkpoint the app and xpra together. To restore the app, you restore the checkpoint and then attach xpra to an X server.

redhog avatar Feb 23 '20 12:02 redhog

Theoretically this should all be command line testing. However, it fails for me, saying that xpra even in detached mode has external connections:

$ xpra start --start xterm
$ pstree $(pgrep xpra)
xpra─┬─Xvfb───4*[{Xvfb}]
     ├─sh───xterm───bash
     └─{xpra}
$ sudo criu dump --shell-job -t $(pgrep xpra)
[sudo] password for redhog: XXXXXXX
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 30023 with interrupted system call
Warn  (compel/arch/x86/src/lib/infect.c:249): Will restore 30237 with interrupted system call
Error (criu/sk-unix.c:709): sk unix: External socket is used. Consider using --ext-unix-sk option.
Error (criu/cr-dump.c:1709): Dumping FAILED.

redhog avatar Feb 23 '20 12:02 redhog

dmtcp on the other hand, segfaults on python processes. So that's kind of where I'm stuck with this..

redhog avatar Feb 23 '20 12:02 redhog

RE: criu

So, running criu check on my system returns "Looks good." but there are several errors in docker. Perhaps some kernel options are not defined in the docker Linux kernel.

https://criu.org/Linux_kernel

IanTrudel avatar Feb 23 '20 15:02 IanTrudel

criu inside docker is special according to some stuff I read somewhere... but let's ignore docker for the moment - if you can get criu to checkpoint xpra on your host system that would be super cool as a start,..

redhog avatar Feb 23 '20 16:02 redhog

@BackOrder I assigned you on this one, as I was thinking that right now I won't poke at it, but maybe you will?

redhog avatar Mar 11 '20 09:03 redhog

Early report from @alexkh is that CRIU will not be able to fulfill the task within the context of InfiniteGlass. It only support checkpoints via VNC, as other similar applications do, according to https://www.criu.org/Comparison_to_other_CR_projects.

Saving and restoring console applications is possible but not without its shortcomings. For example, vim will be restored but its display is messed up. There is no indication that it would properly work for X applications besides via VNC. It also requires root.

Using CRIU C API might be helpful but there is no way to be sure it will work as we would like. https://www.criu.org/C_API

Xpra is most likely a better candidate and browsing on the web seems to reveal that people use it locally (not only remotely).

LXC could be another candidate. Here is an article to run X11 applications on LXC: https://blog.simos.info/how-to-easily-run-graphics-accelerated-gui-apps-in-lxd-containers-on-your-ubuntu-desktop/

IanTrudel avatar Mar 17 '20 17:03 IanTrudel

Xpra replaces VNC, not CRIU etc. It just lets you sort of bundle an X app with a Vnc like server, and provides a client that handles rootless windows (i.e. the X app can have multiple windows that are not all stuck inside a single window like with VNC).

Only drawback with Xpra is that it does not support opengl for the apps...

redhog avatar Mar 17 '20 18:03 redhog

Actually, I found an interesting article where the author is using Xpra like screen/tmux. Take a look!

https://aweirdimagination.net/2015/03/30/detachable-x-sessions/

IanTrudel avatar Mar 17 '20 18:03 IanTrudel

In any case, CRIU doesn't seem to deliver on its promise. You might have more luck using the API considering the control InfiniteGlass has but I wouldn't hold my breath.

Alternatively, if we push far back to the land of unsupported software, there would be NeatX that would fit right in InfiniteGlass since it's a mix of Python and C. https://code.google.com/archive/p/neatx/

IanTrudel avatar Mar 17 '20 18:03 IanTrudel

Detached process is something that keeps running when you close your terminal, because it is "detached" from the parent process (the shell inside that terminal). When you disconnect the terminal (local or remote - does not matter), the process continues running because the OS does not automatically kill it when you close the terminal. The key point here is "keeps running" because no state is actually saved, and when you turn off the computer it runs on, that process is gone.

alexkh avatar Mar 17 '20 22:03 alexkh

Heyas @alexkh! So from what I can see, Xpra, NeatX and VNC all just basically implement detached processes for X. CRIU and DMTCP implements saving a process, or group of processes and their tcp connections between themselves, to disk and restoring it later.

My theory was that we could use these two techniques together: Use Xpra to detatch a process from the X server, and then save the Xpra and application processes together using CRIU or DMTCP. Then later, possibly after a reboot, restore the processes and reattatch them to a new X server.

redhog avatar Mar 17 '20 22:03 redhog

@redhog this is a lot of layers but that could possibly work. Otherwise, if we find some libraries that could do those things and integrate them into InfiniteGlass, that would be less hassle. Let's not kid ourselves here, the solution might end up having to program one from the ground up.

IanTrudel avatar Mar 17 '20 22:03 IanTrudel

Further investigation reveals that we might be able to use virtualization to provide checkpointing within InfiniteGlass. The idea would be to use KVM to create a containers for each application and to use Xpra to ensure X11 can resume the connection with those applications.

The KVM containers are meant to be transparent to the users leaving regular access to the filesystem, network, etc. It should also use hardware passthrough to fully benefit from direct access, such as GPU acceleration.

This solution requires a computer that has virtualization capability within its CPU and that the feature is enabled in the BIOS and on Linux.

IanTrudel avatar Apr 09 '20 22:04 IanTrudel

https://github.com/Merrit/nyrna

IanTrudel avatar Jun 26 '20 19:06 IanTrudel

https://github.com/Merrit/nyrna

From the description, it just "pauses" a process, keeping it in RAM.

alexkh avatar Jun 27 '20 22:06 alexkh