bosh-azure-cpi-release icon indicating copy to clipboard operation
bosh-azure-cpi-release copied to clipboard

Snapshot based disk migration

Open s4heid opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

Currently, Azure does not support a direct migration from Premium SSD v2 or Ultra SSD to other disk types, as detailed here. This limitation prevents the use of the native disk update feature for those looking to switch storage types from Premium SSD v2 or Ultra Disks.

Describe the solution you'd like

Although direct support is not available yet, changing the disk type is feasible by utilizing snapshots. If you're transitioning from Premium SSD v2 or Ultra Disks, the update_disk method should do the following steps:

  1. unmounting the disk
  2. creating a snapshot, ensuring the completionPercent reaches 100
  3. generating a new disk from this snapshot
  4. mounting the new disk
  5. removing the snapshot again

Pending Task: evaluate and compare the efficiency of using regular copy methods versus snapshots.

Additional context This request is a continuation of issue #697 and has been suggested by @MSSedusch.

s4heid avatar Sep 09 '24 11:09 s4heid

It appears that the situation is somewhat more complex than initially thought. There are several caveats to note about snapshots and the Premium SSD v2 disk type:

  • Ultra and Premium SSD v2 disks only support incremental snapshots.

  • When you create an incremental snapshot of either a Premium SSD v2 or an Ultra Disk, the first snapshot acts as a full copy of the disk. However, after taking this initial snapshot, you cannot use it immediately. There is a background copy process that must complete before you can create a new disk from that snapshot. See reference.

    Attempting to create a new disk from the snapshot before the background process completes results in an error from the Azure API:

    $ az disk create --name myNewPremiumDisk --resource-group rg-disk-test --size-gb 1024 --sku Premium_LRS --source "/subscriptions/<subscription>/resourceGroups/rg-disk-test/providers/Microsoft.Compute/snapshots/mySnapshot"
    (Conflict) Source incremental snapshot sebastian-snap-other copy is still in progress. Please retry after source snapshot's copy has completed.
    Code: Conflict
    Message: Source incremental snapshot mySnapshot copy is still in progress. Please retry after source snapshot's copy has completed.
    
  • Full snapshots require significant time to complete, especially for large disks, because they copy the entire data set of the disk. Incremental snapshots are much faster to create, because they only capture the changes made since the last snapshot.

To minimize downtime while converting Premium v2 disks via incremental snapshots, here are a few strategies that come to my mind:

  1. We could shorten the downtime by creating an initial incremental snapshot (full copy) before even starting the update_disk process and applying only the incremental snapshot (which copies the delta) during update_disk. The initial snapshot could be taken upon disk creation e.g. when the enable_cpi_update_disk is active. The azure cpi's create_disk method could be instructed with a parameter passed by bosh via cloud_properties to perform the snapshot creation.

    Implementing this solution could significantly reduce the downtime, but would require some modifications to bosh and it seems like this could become a larger change, since we also have to manage the snapshots lifecycle.

  2. Accept a longer downtime during disk updates, as the default copy mechanism used by bosh may not be quicker (this needs an evaluation). Monitor the snapshot's status with a simple polling algorithm and proceed with disk creation once the completionPercent reaches 100.0

    $ az snapshot show -n sebastian-snap -g rg-disk-test --query '[completionPercent]' -o tsv
    100.0
    

    If creating and waiting for the snapshot is not faster than bosh's default copying mechanism, the question is whether this extra implementation is worth the effort.

s4heid avatar Nov 18 '24 17:11 s4heid