Optimizing the live migration process
Description I am comparing the difference between process live migration and virtual machine live migration across nodes. I find when a process is live migrated, a large part of the downtime is caused by CRIU reading pages.img into memory during the restoration phase, as mentioned in issue #1551 . A possible solution is the --lazy-pages option, but this will cause performance loss after the program resumes running.
In the live migration process of a virtual machine, dirty memory pages are iteratively loaded into the new virtual machine at the destination during the pre-copy stage. So can CRIU also learn from this method? The process I expect is to pre-load the data in the memory into the VMA of the CRIU process of the target node in the pre-dump stage, and only need to read the pages.img generated by the final dump in the recovery stage. I don’t know if this is feasible.
Describe the results you expected: Let me elaborate a bit more on my desired online migration process.
In the current live migration process (https://criu.org/Live_migration), the target node is idle when the source node performs pre-dump. I want to be able to follow the flow:
- The source node sends the images to the target node immediately after each pre-dump
- The target node reads the data in the images into the memory, and waits for the next round of pre-dump images
- The source node performs final dump and sends all images to the target node
- The target node restores the process. This step only needs to read the dirty pages generated by the final dump into the memory, thus greatly reducing the downtime
I know this requires a lot of modifications to the CRIU source code, but I wonder if this is possible without making modifications to the Linux kernel code.
I am also very interested in the pre-restore mechanism and am curious about the difficulties in implementing it with CRIU.
A friendly reminder that this issue had no activity for 30 days.
Sounds doable. Instead of having the memory pages on disk you already load them to memory. If you already map them at the right location or not is probably another question which would make it more complicated. Probably depends on what you want to achieve.
I don't really know if the page server would give you almost everything you want or not? Maybe worth looking into the page server.
Thank you very much for your reply. My goal is to shorten the downtime as much as possible when restoring processes with large memory images, which may become a long-term work in the future.
As you said, this can refer to the implementation of the page-server. However, I am more concerned about whether it will be restricted by the current underlying operating system kernel. Compared to virtual machines, processes have a higher coupling degree with the node operating system, and the process status in the kernel is much more than the VM status in the hypervisor. If we want to achieve the expected recovery process, we need to continuously modify the contents in the virtual address space of the process and process information in the kernel. I don't know if this is feasible under the current Linux kernel security mechanism. Or, as a compromise, whether using other kernels such as Libos would be more conducive to achieving this goal. I don't know if anyone has explored this issue before
A friendly reminder that this issue had no activity for 30 days.