cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

File-based disk-only VM snapshot with KVM as hypervisor

Open JoaoJandre opened this issue 6 months ago • 38 comments

Description

This PR implements the spec available at #9524. For more information regarding it, please read the spec.

Furthermore, the following changes that are not contemplated in the spec were added:

  1. The snapshot.merge.timeout agent property was added. It is only considered if libvirt.events.enabled is true;
  2. A new snapshot merge process (which affects normal volume snapshots and this feature) was created. When libvirt.events.enabled is true, ACS will register to gather events from Libvirt and will collect information on the process, providing a progress report in the logs. If the configuration is false, the old process is used;
  3. Volumes attached to VMs with file-based disk-only VM snapshots in KVM are able to be resized.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [X] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI
  • [ ] test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [X] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Basic Tests

I created a test VM to carry out the tests below. Additionally, after performing the relevant operations, the VM's XML and the storage were checked to observe if the snapshots existed.

Snapshot Creation

The tests below were also repeated with the VM stopped.

N Test Result
1 Take a snapshot of VM 1 without specifying quiesceVM Snapshot created
2 Take a snapshot of VM 2 specifying quiesceVM Snapshot created

Snapshot Reversion

N Test Result
1 Revert VM in Running state to any snapshot Error thrown
2 Revert VM in Stopped state to snapshot 1 and start it VM reverted and started successfully

Snapshot Removal

N Test Result
1 Create a new snapshot 3 after the second reversion test and delete snapshot 1 I verified that the snapshot was no longer listed and had the correct database metadata, the file still existed because more than one delta depended on it
2 Delete snapshot 2 Snapshot deleted; snapshot 1 was merged with snapshot 3 since it only had the latter as a dependency
3 Delete snapshot 3 (current) Snapshot removed, merged with the VM's volume
4 Create 3 snapshots and remove the first one Snapshot removed, merged with the second snapshot
5 Create two snapshots, revert to the first, and delete the second Snapshot deleted

Advanced Tests

Deletion Test

All tests were carried out with the VM stopped.

  1. I created 3 snapshots: s1, s2, and s3.
  2. I reverted the VM to snapshot s2.
  3. I created snapshot s4.
  4. I removed snapshot s2.

The snapshot was marked as hidden and was not removed from storage.

  1. I removed snapshot s3.

Snapshot s3 was removed normally. Snapshot s2 was merged with snapshot s4.

  1. I created snapshot s5.
  2. I reverted to snapshot s4.
  3. I removed snapshot s4.

Snapshot s4 was marked as hidden and was not removed from storage.

  1. I removed snapshot s5. Snapshot s5 was removed normally. Snapshot s4 was merged with the delta of the VM's volume.
  2. I removed the last remaining snapshot (s1). It was removed normally.

Reversion Test

  1. I created two snapshots: s1 and s2.
  2. I reverted to snapshot s1.
  3. I removed snapshot s1.

Snapshot s1 was marked as hidden and was not removed from storage.

  1. I reverted to snapshot s2. Snapshot s1 was merged with the base volume.

Concurrent Test

I created 4 VMs and took a VM snapshot of each. Then, I instructed to remove them all at the same time. All snapshots were removed simultaneously and successfully.

Test with Multiple Volumes

I created a VM with one datadisk and attached 8 more datadisks (10 volumes in total), took two VM snapshots, and then instructed to remove one at a time. The snapshots were removed successfully.

Tests Changing the snapshot.merge.timeout Config

  1. I changed the config to 1 and restarted the host;
  2. I created a VM, took a VM snapshot, accessed it, and wrote 4GB of data to it;
  3. I tried to remove the snapshot, an error occurred, and looking at the logs, it was possible to observe that it timed out;
  4. I manually aborted the blockcommit process;
  5. I changed the config to 0 and restarted the host;
  6. I tried to remove the snapshot, and it was performed correctly;

Tests Related to Volume Resize with Disk-Only VM Snapshots on KVM

Test Result Expected?
Create a VM, take a snapshot, resize the volume Resize performed successfully, both in metadata and when checked with qemu-img info Y
Stop the VM and revert the snapshot Revert performed successfully, volume size returned to original, both in metadata and qemu-img info Y
Remove the snapshot with the VM stopped The delta of the volume was correctly merged with the snapshot's, and the final size was that of the volume Y
Start the VM, take a new snapshot, resize the volume, and remove the snapshot Deltas were correctly merged, and the final size was that of the volume Y

The last two tests were repeated on a VM with several snapshots, so that a merge between snapshots was performed. The result was the same.

Tests Related to Events:

  1. Create VM, take disk-only VM snapshot, resize the root volume by 1GB more, stop the VM, revert the snapshot. It was observed through the cloud.usage_event table that the resize event was correctly triggered, and it was also observed via GUI that the account's resource limit was updated.
  2. Repeat the test above with a VM with two volumes, with only one resized. The test had the same result, and only one resize event was triggered, for the volume that had been resized.

JoaoJandre avatar Mar 28 '25 13:03 JoaoJandre