eve icon indicating copy to clipboard operation
eve copied to clipboard

[DRAFT] Add functionality for EVE to interact with applications via qemu-ga

Open vk-en opened this issue 2 years ago • 14 comments

This PR adds functionality for consistent support for ZFS application snapshots. A proposal for adding this functionality can be found here.

vk-en avatar Aug 25 '22 12:08 vk-en

Please name the PR as DRAFT. First we should have internal discussions on the design.

zedi-pramodh avatar Aug 25 '22 13:08 zedi-pramodh

@vk-en Does the API to the guest get handled entirely in the guest kernel, or are there hooks in the guest operating systems to have it propagate to applications which are interested? Basically is this useful to get application consistent snapshots?

eriknordmark avatar Sep 13 '22 23:09 eriknordmark

@vk-en Does the API to the guest get handled entirely in the guest kernel, or are there hooks in the guest operating systems to have it propagate to applications which are interested?

It is assumed that an "interceptor" is launched in the guest OC, in our case it is qemu-guest-agent, which listens to /dev/vsock and executes the commands we need in the guest OS. I have not yet added a commit to this PR that adds a new vhost-vsock-pci device to the qemu configuration, as we have not yet decided that we will use it, and not virtio-serial. Also at the moment, in terms of "influence" on interested applications, qemu-ga can only run something (for example, the application we need) on the guest OS. (The ability to run applications also depends on the version of qemu-ga on the guest OS)

There is no API as such. Simply there is a set of commands which are transferred through a socket and which the agent on the guest side supports. In general, we can send anything to the guest, the main thing is that there is an application that can read and process our command.

Basically is this useful to get application consistent snapshots?

Yes, this allows you to influence the guest's file system (flush cache, freeze/unfreeze fs), and control its state on the EVE side.

vk-en avatar Sep 14 '22 11:09 vk-en

Yes, this allows you to influence the guest's file system (flush cache, freeze/unfreeze fs), and control its state on the EVE side.

But that doesn't get application consistent snapshots AFAICT. Looking at https://qemu.readthedocs.io/en/latest/interop/qemu-ga-ref.html#qapidoc-82 there is no indication that fsfreeze has a hook to tell applications (such as databases) to flush all of their buffers to the kernel (before it tells the kernel to flush all of its buffers to disk).

We need an approach which has the ability to notify and wait for applications to flush there buffers. Can qemu-guest-agent do that??

eriknordmark avatar Sep 14 '22 18:09 eriknordmark

@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?

zedi-pramodh avatar Sep 15 '22 22:09 zedi-pramodh

https://qemu-project.gitlab.io/qemu/system/images.html#vm-005fsnapshots

zedi-pramodh avatar Sep 15 '22 23:09 zedi-pramodh

Well zfs uses raw format, looks like this feature requires qcow2, may be we can use this feature for support on ext4.

zedi-pramodh avatar Sep 15 '22 23:09 zedi-pramodh

@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?

@zedi-pramodh I'm confused. Are we taking about application consistent snapshots of the volume(s), or are we talking about a snapshot of a running app instance VM memory?

eriknordmark avatar Sep 16 '22 01:09 eriknordmark

@eriknordmark I am thinking we can use qemu savevm + eve level snapshot combination to get the app consistency.

Say when user wants to take snapshot.

  1. eve executes qemu savevm, which snapshots CPU, RAM at that point in VM. (need to figure out how and where it stores)
  2. Then eve triggers (zfs or qemu snapshot depending on filesystem)

At this point we should have a snapshot of storage,CPU and RAM. A loadvm should restart app with all state intact at the time of snapshot.

May be I am completely wrong too in my understanding. May be its worth spending learning more on qemu savevm/loadvm.

zedi-pramodh avatar Sep 16 '22 16:09 zedi-pramodh

At this point we should have a snapshot of storage,CPU and RAM. A loadvm should restart app with all state intact at the time of snapshot.

That seems to be a very different proposal than what we have discussed in https://wiki.lfedge.org/display/EVE/Snapshots+in+EVE The proposal is about application-consistent snapshots of the applications' volumes.

To implement that we need to have a mechanism to 1) ask the application in the VM to flush any application buffers, 2) ask the kernel to flush any bufffers, 3) take the snapshot in the host 4) tell the application in the VM and 5) the guest kernel to unfreeze.

I'm asking how we plan to do #1. My understanding is that what is in this PR can address 2 and 5, but that isn't sufficient AFAIU.

eriknordmark avatar Sep 16 '22 21:09 eriknordmark

I just saw this in the proposal "It is possible to specify a hook to run each time fsfreeze/thaw happened. Such a script can be useful to flush the state of a user application (e.g. a database), to make sure not only the filesystem is consistent, but also it has the latest (and consistent as well) data from the application."

If such a hook exists then we have a solution based on what is in this PR, but need to verify that hook is there.

eriknordmark avatar Sep 16 '22 21:09 eriknordmark

@eriknordmark

But that doesn't get application consistent snapshots AFAICT.

Correctly. I agree

We need an approach which has the ability to notify and wait for applications to flush there buffers. Can qemu-guest-agent do that??

The only approach that is available to us in this case, we can provide the ability to send a command to the guest OS, which, when executed, will clear the buffer in application N. We can't tailor code to any specific applications, and it doesn't make any sense (who will support it? how to monitor it? What if the application code is closed? etc.), but we can provide the ability to send it along with the cmd "create/rollback snapshot" - user command for reset the application's buffer (if it supports it, of course). This method is used by extensive cloud services, such as Google Cloud (snapshot scripts). Qemu-guest-agent allows you to execute commands sent from the host. So this option is available to us.

@zedi-pramodh

@vk-en did you look at qemu savevm and loadvm functionality ? Looks like savevm can snapshot the RAM, CPU too and that probably helps with app consistency ?

Yes, I saw, but this is not what we need at the moment.

vk-en avatar Sep 19 '22 12:09 vk-en

We can't tailor code to any specific applications, and it doesn't make any sense (who will support it? how to monitor it? What if the application code is closed? etc.), but we can provide the ability to send it along with the cmd "create/rollback snapshot" - user command for reset the application's buffer (if it supports it, of course). This method is used by extensive cloud services, such as Google Cloud (snapshot scripts).

Can we reuse the google-guest-agent or create an eve-guest-agent package based on the google one? It seems to have the right functionality based ont he above page.

eriknordmark avatar Sep 20 '22 23:09 eriknordmark

Can we reuse the google-guest-agent or create an eve-guest-agent package based on the google one? It seems to have the right functionality based ont he above page.

It would not be advisable to use google-ga, since it is sharpened to work with the google cloud. It will be much easier and more convenient to create own eve-guest-agent package based on AF_VSOCK with a solution similar to google-ga. It also makes more sense since the agent requires interaction with EVE. Also, this solution will save us from configuring the agent when it starts. And, also, our own guest agent will allow us to implement the methodology that Roman suggests, when the VM can act as the initiator of the create/rollback snapshot.

vk-en avatar Sep 26 '22 12:09 vk-en

No progress on that. @eriknordmark can we close this with the "stalled" label?

rouming avatar Jan 19 '23 11:01 rouming

Closing as stalled.

eriknordmark avatar Jan 24 '23 00:01 eriknordmark