bosh-azure-cpi-release
bosh-azure-cpi-release copied to clipboard
Snapshot based disk migration
Is your feature request related to a problem? Please describe.
Currently, Azure does not support a direct migration from Premium SSD v2 or Ultra SSD to other disk types, as detailed here. This limitation prevents the use of the native disk update feature for those looking to switch storage types from Premium SSD v2 or Ultra Disks.
Describe the solution you'd like
Although direct support is not available yet, changing the disk type is feasible by utilizing snapshots. If you're transitioning from Premium SSD v2 or Ultra Disks, the update_disk method should do the following steps:
- unmounting the disk
- creating a snapshot, ensuring the
completionPercentreaches 100 - generating a new disk from this snapshot
- mounting the new disk
- removing the snapshot again
Pending Task: evaluate and compare the efficiency of using regular copy methods versus snapshots.
Additional context This request is a continuation of issue #697 and has been suggested by @MSSedusch.
It appears that the situation is somewhat more complex than initially thought. There are several caveats to note about snapshots and the Premium SSD v2 disk type:
-
Ultra and Premium SSD v2 disks only support incremental snapshots.
-
When you create an incremental snapshot of either a Premium SSD v2 or an Ultra Disk, the first snapshot acts as a full copy of the disk. However, after taking this initial snapshot, you cannot use it immediately. There is a background copy process that must complete before you can create a new disk from that snapshot. See reference.
Attempting to create a new disk from the snapshot before the background process completes results in an error from the Azure API:
$ az disk create --name myNewPremiumDisk --resource-group rg-disk-test --size-gb 1024 --sku Premium_LRS --source "/subscriptions/<subscription>/resourceGroups/rg-disk-test/providers/Microsoft.Compute/snapshots/mySnapshot" (Conflict) Source incremental snapshot sebastian-snap-other copy is still in progress. Please retry after source snapshot's copy has completed. Code: Conflict Message: Source incremental snapshot mySnapshot copy is still in progress. Please retry after source snapshot's copy has completed. -
Full snapshots require significant time to complete, especially for large disks, because they copy the entire data set of the disk. Incremental snapshots are much faster to create, because they only capture the changes made since the last snapshot.
To minimize downtime while converting Premium v2 disks via incremental snapshots, here are a few strategies that come to my mind:
-
We could shorten the downtime by creating an initial incremental snapshot (full copy) before even starting the
update_diskprocess and applying only the incremental snapshot (which copies the delta) duringupdate_disk. The initial snapshot could be taken upon disk creation e.g. when theenable_cpi_update_diskis active. The azure cpi's create_disk method could be instructed with a parameter passed by bosh viacloud_propertiesto perform the snapshot creation.Implementing this solution could significantly reduce the downtime, but would require some modifications to bosh and it seems like this could become a larger change, since we also have to manage the snapshots lifecycle.
-
Accept a longer downtime during disk updates, as the default copy mechanism used by bosh may not be quicker (this needs an evaluation). Monitor the snapshot's status with a simple polling algorithm and proceed with disk creation once the
completionPercentreaches100.0$ az snapshot show -n sebastian-snap -g rg-disk-test --query '[completionPercent]' -o tsv 100.0If creating and waiting for the snapshot is not faster than bosh's default copying mechanism, the question is whether this extra implementation is worth the effort.