eve
eve copied to clipboard
[DRAFT] Add functionality for EVE to interact with applications via qemu-ga
This PR adds functionality for consistent support for ZFS application snapshots. A proposal for adding this functionality can be found here.
Please name the PR as DRAFT. First we should have internal discussions on the design.
@vk-en Does the API to the guest get handled entirely in the guest kernel, or are there hooks in the guest operating systems to have it propagate to applications which are interested? Basically is this useful to get application consistent snapshots?
@vk-en Does the API to the guest get handled entirely in the guest kernel, or are there hooks in the guest operating systems to have it propagate to applications which are interested?
It is assumed that an "interceptor" is launched in the guest OC, in our case it is qemu-guest-agent, which listens to /dev/vsock and executes the commands we need in the guest OS. I have not yet added a commit to this PR that adds a new vhost-vsock-pci device to the qemu configuration, as we have not yet decided that we will use it, and not virtio-serial. Also at the moment, in terms of "influence" on interested applications, qemu-ga can only run something (for example, the application we need) on the guest OS. (The ability to run applications also depends on the version of qemu-ga on the guest OS)
There is no API as such. Simply there is a set of commands which are transferred through a socket and which the agent on the guest side supports. In general, we can send anything to the guest, the main thing is that there is an application that can read and process our command.
Basically is this useful to get application consistent snapshots?
Yes, this allows you to influence the guest's file system (flush cache, freeze/unfreeze fs), and control its state on the EVE side.
Yes, this allows you to influence the guest's file system (flush cache, freeze/unfreeze fs), and control its state on the EVE side.
But that doesn't get application consistent snapshots AFAICT. Looking at https://qemu.readthedocs.io/en/latest/interop/qemu-ga-ref.html#qapidoc-82 there is no indication that fsfreeze has a hook to tell applications (such as databases) to flush all of their buffers to the kernel (before it tells the kernel to flush all of its buffers to disk).
We need an approach which has the ability to notify and wait for applications to flush there buffers. Can qemu-guest-agent do that??
@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?
https://qemu-project.gitlab.io/qemu/system/images.html#vm-005fsnapshots
Well zfs uses raw format, looks like this feature requires qcow2, may be we can use this feature for support on ext4.
@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?
@zedi-pramodh I'm confused. Are we taking about application consistent snapshots of the volume(s), or are we talking about a snapshot of a running app instance VM memory?
@eriknordmark I am thinking we can use qemu savevm + eve level snapshot combination to get the app consistency.
Say when user wants to take snapshot.
- eve executes qemu savevm, which snapshots CPU, RAM at that point in VM. (need to figure out how and where it stores)
- Then eve triggers (zfs or qemu snapshot depending on filesystem)
At this point we should have a snapshot of storage,CPU and RAM. A loadvm should restart app with all state intact at the time of snapshot.
May be I am completely wrong too in my understanding. May be its worth spending learning more on qemu savevm/loadvm.
At this point we should have a snapshot of storage,CPU and RAM. A loadvm should restart app with all state intact at the time of snapshot.
That seems to be a very different proposal than what we have discussed in https://wiki.lfedge.org/display/EVE/Snapshots+in+EVE The proposal is about application-consistent snapshots of the applications' volumes.
To implement that we need to have a mechanism to 1) ask the application in the VM to flush any application buffers, 2) ask the kernel to flush any bufffers, 3) take the snapshot in the host 4) tell the application in the VM and 5) the guest kernel to unfreeze.
I'm asking how we plan to do #1. My understanding is that what is in this PR can address 2 and 5, but that isn't sufficient AFAIU.
I just saw this in the proposal "It is possible to specify a hook to run each time fsfreeze/thaw happened. Such a script can be useful to flush the state of a user application (e.g. a database), to make sure not only the filesystem is consistent, but also it has the latest (and consistent as well) data from the application."
If such a hook exists then we have a solution based on what is in this PR, but need to verify that hook is there.
@eriknordmark
But that doesn't get application consistent snapshots AFAICT.
Correctly. I agree
We need an approach which has the ability to notify and wait for applications to flush there buffers. Can qemu-guest-agent do that??
The only approach that is available to us in this case, we can provide the ability to send a command to the guest OS, which, when executed, will clear the buffer in application N. We can't tailor code to any specific applications, and it doesn't make any sense (who will support it? how to monitor it? What if the application code is closed? etc.), but we can provide the ability to send it along with the cmd "create/rollback snapshot" - user command for reset the application's buffer (if it supports it, of course). This method is used by extensive cloud services, such as Google Cloud (snapshot scripts). Qemu-guest-agent allows you to execute commands sent from the host. So this option is available to us.
@zedi-pramodh
@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?
Yes, I saw, but this is not what we need at the moment.
We can't tailor code to any specific applications, and it doesn't make any sense (who will support it? how to monitor it? What if the application code is closed? etc.), but we can provide the ability to send it along with the cmd "create/rollback snapshot" - user command for reset the application's buffer (if it supports it, of course). This method is used by extensive cloud services, such as Google Cloud (snapshot scripts).
Can we reuse the google-guest-agent or create an eve-guest-agent package based on the google one? It seems to have the right functionality based ont he above page.
Can we reuse the google-guest-agent or create an eve-guest-agent package based on the google one? It seems to have the right functionality based ont he above page.
It would not be advisable to use google-ga, since it is sharpened to work with the google cloud. It will be much easier and more convenient to create own eve-guest-agent package based on AF_VSOCK with a solution similar to google-ga. It also makes more sense since the agent requires interaction with EVE. Also, this solution will save us from configuring the agent when it starts. And, also, our own guest agent will allow us to implement the methodology that Roman suggests, when the VM can act as the initiator of the create/rollback snapshot.
No progress on that. @eriknordmark can we close this with the "stalled" label?
Closing as stalled.