cockpit-machines icon indicating copy to clipboard operation
cockpit-machines copied to clipboard

snapshots: Optionally inform about creating internal snapshots

Open mvollmer opened this issue 1 year ago • 34 comments

This is based on and meant to go together with the RHEL product documentation.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/creating-virtual-machine-snapshots_configuring-and-managing-virtualization

That documentation explicitly calls out internal snapshots as deprecated and strongly advises against using them in production environments.

However, it also says that "Internal snapshots might work for your use case, but Red Hat does not provide full testing and support for them. ", external snapshots don't work in all cases, and upstream libvirt does not consider them deprecated so Cockpit should probably not fully prevent their use on RHEL. (And there are no requests to us for preventing them.)

Thus, let's have a little notice in the creation dialog with a link to the product documentation.

image

mvollmer avatar Apr 12 '24 11:04 mvollmer

I have to better understand the motivation for this. Maybe we should actively warn people when they are about to create a internal snapshot. Maybe we should put a short and neutral explainer text in the dialog instead of just one word in the title.

mvollmer avatar Apr 12 '24 11:04 mvollmer

I'm afraid I can't help you with the motivation either -- this feels like an internal implementation detail which users really shouldn't need to care about. But putting that into the title WFM -- at least this is nice as a discussion ground with the libvirt folks. Thanks!

martinpitt avatar Apr 12 '24 11:04 martinpitt

I'm afraid I can't help you with the motivation either -- this feels like an internal implementation detail which users really shouldn't need to care about.

Internal details matter when debugging issues, and this is an easy way to tell your support contact on IRC what is going on, even if you yourself have no idea what it means.

mvollmer avatar Apr 12 '24 11:04 mvollmer

I have to better understand the motivation for this. Maybe we should actively warn people when they are about to create a internal snapshot. Maybe we should put a short and neutral explainer text in the dialog instead of just one word in the title.

The motivation seems to be to just provide information which kind of snapshot Cockpit is making, to help with support cases.

I think putting it in the title might actually be the least distracting and most clearest way to do this. Putting a paragraph of text in the dialog body calls too much attention to this detail. Adding a warning against internal snapshots is not warranted; they are just fine in the cases Cockpit uses them.

An alternative would be to not say anything during creation of snapshots, but to list the type of snapshots in the snapshot card.

mvollmer avatar Apr 16 '24 08:04 mvollmer

this feels like an internal implementation detail which users really shouldn't need to care about

Yep.

garrett avatar Apr 17 '24 07:04 garrett

this feels like an internal implementation detail which users really shouldn't need to care about

Yep.

Agreed. I don't like changing the dialog either. The user might not understand why that word is in the title, and if they do and they want to have the other type of snapshots, Cockpit wont give it to them anyway. The title of the dialog needs to repeat the title of the button.

Instead, let's just put a new entry into "Hypervisor details":

image

@garrett, wdyt?

mvollmer avatar Apr 17 '24 08:04 mvollmer

@mvollmer That's too coarse-grained IMHO. You can very well have internal and external snapshots on the same VM. Not necessarily created by cockpit, but the CLI does allow it. I liked your idea about showing an internal/external label in the snapshots table -- that feels like the right choice wrt. how the API works, that you don't have a choice about it in cockpit, and conveying the information.

martinpitt avatar Apr 17 '24 08:04 martinpitt

Snapshot type internal doesn't tell me anything useful though. What does that mean? What is the alternative? How do you change it? Why would you change it? Where is "internal" stored? Etc.

(I'd assume it stores the snapshot next to the disk image, wherever that is. Which is probably what people would want a vast majority of the time?)

This really seems like it's an unnecessary implementation detail. Please feel free to tell me otherwise, though.

garrett avatar Apr 17 '24 08:04 garrett

I liked your idea about showing an internal/external label in the snapshots table -- that feels like the right choice wrt. how the API works, that you don't have a choice about it in cockpit, and conveying the information.

However, that only tells people what they have done after the fact.

mvollmer avatar Apr 17 '24 08:04 mvollmer

However, that only tells people what they have done after the fact.

Yes, IMHO that's fine. As far as I remember, the libvirt team doesn't even want us to give a choice about internal/external in the creation dialog, so there's nothing to decide for the user anyway? If we offer that choice, then it needs to go into the dialog, of course.

But in no case is this a hypervisor or VM property.

martinpitt avatar Apr 17 '24 08:04 martinpitt

So, going back to the original feature request, I think the best would be to precisely detect when people are about to do something unsupported, and warn them about it.

In this case this would be creating an internal snapshot of a running machine on RHEL. So maybe we do just that.

(I didn't even know until a minute ago that internal snapshots also store RAM... )

mvollmer avatar Apr 17 '24 08:04 mvollmer

But in no case is this a hypervisor or VM property.

Well, I could argue that it's a derived property and it means "the type of snapshots that cockpit will create", and that is solely determined by the VM (but might change over time).

But I wont. :-)

mvollmer avatar Apr 17 '24 08:04 mvollmer

and warn them about it. In this case this would be creating an internal snapshot of a running machine on RHEL.

IMHO this would be a terrible UX. Cockpit should not create snapshots which the API doesn't offer (and our current feature detection should be quite adequate -- may have bugs, of course). I.e. on current RHEL it should always create an external snapshot, with RAM (if the VM is running) or without RAM (if it's off).

It could of course be that the API offers functionality which the libvirt team doesn't want to support. IMHO that's just lying to ourselves, and then I'd rather not offer the functionality at all than hiding behind some "but I told you it may break" sign. Users will quickly learn to ignore these anyway, and they are not at all helpful. It's our responsibility as OS builders to define functionality and make sure it effing works.

martinpitt avatar Apr 17 '24 08:04 martinpitt

I.e. on current RHEL it should always create an external snapshot, with RAM (if the VM is running) or without RAM (if it's off).

There are cases in vmSupportsExternalSnapshots that look like they might happen on RHEL:

    // If at leat one disk has internal snapshot preference specified, use internal snapshot for all disk,
    // as mixing internal and external is not allowed
    const disks = Object.values(vm.disks);
    if (disks.some(disk => disk.snapshot === "internal")) {
        logDebug(`vmSupportsExternalSnapshots: vm ${vm.name} has internal snapshot preference specified`);
        return false;
    }

    // Currently external snapshots works only for disks where we can specify a local path of a newly created disk snapshot file
    // This rules out all disks which are not file-based, or pool-based where pool is of type "dir"
    // or media-less disks
    const supportedPools = storagePools.filter(pool => pool.type === "dir" && pool.connectionName === vm.connectionName);
    if (!disks.every(disk =>
        (disk.type === "file" && disk.source?.file) ||
        (disk.type === "volume" && supportedPools.some(pool => pool.name === disk.source.pool))
    )) {
        logDebug(`vmSupportsExternalSnapshots: vm ${vm.name} has unsupported disk type`);
        return false;
    }

mvollmer avatar Apr 17 '24 09:04 mvollmer

@mvollmer Ah right, I forgot about this hilarious hackery. As soon as you attach e.g. an empty CD drive or other devices to the VM, external snapshots don't work any more. I don't know whether this is by design, or bug workarounds -- these are definitively good matter to discuss with libvirt folks.

martinpitt avatar Apr 17 '24 09:04 martinpitt

IMHO this would be a terrible UX.

Yeah, and also bound to be incorrect as the supported/not-supported status of things change. I have underestimated the complexity enterprises can create re the "is it supported?" questions...

mvollmer avatar Apr 17 '24 09:04 mvollmer

This is cursed...

mvollmer avatar Apr 17 '24 09:04 mvollmer

Ok, my understanding has evolved a bit... :-)

The upcoming documentation for RHEL 9.4 will talk extensively about the preconditions for creating supported snapshots: The VM must only have disks of type "file". The documentation also uses the terms "external" and "internal" to distinguish between supported snapshots and "deprecated snapshots that might work for you but are not fully tested or supported". It does not explain the technical differences or go into any further detail.

(I don't know I can quote draft docs here. Ask me for a link if you want to read the details yourself.)

So let's just expose this in Cockpit: If we make a internal snapshot because there is a disk that is not of type "file", we put a paragraph into the dialog that says this and has a link to the "Support limitations for virtual machine snapshots" chapter of the product documentation. We only do this on RHEL.

The text of the paragraph could be

" The snapshot will be created in the "internal" format because this virtual machine uses disks with a source type other than "File". This format is not fully tested or supported. See [the product documentation] for further details. "

I wouldn't even render it as a warning, just as info.

The product documentation also has awkward instructions for checking whether Cockpit has created a external or internal snapshot, using virsh and grep on the command line. To help with this, we could add a little (i) icon somewhere in our table (on RHEL only) that explains the support status of a snapshot.

mvollmer avatar Apr 22 '24 06:04 mvollmer

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all. We don't need to tell people about that.

mvollmer avatar Apr 22 '24 06:04 mvollmer

The VM must only have disks of type "file".

Interesting -- that seems much more restrictive than even what we have now -- e.g. drives without media (empty CD-ROM) would then also not be supported?

The snapshot will be created in the "internal" format because this virtual machine uses disks with a source type other than "File". This format is not fully tested or supported.

If there's lots of pressure, I can live with showing that in the dialog, but it's still nonsense IMHO -- what should the user do? My strong preference for this situation is then to just not offer snapshotting at all. Again we as OS builders have to define which functionality we support and which not -- we can't wash our hands and leave the user with a broken situation.

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all

"Them" == "external snapshots", right?

martinpitt avatar Apr 22 '24 06:04 martinpitt

The other scenario where Cockpit creates internal snapshots is when libvirt doesn't implement them at all

"Them" == "external snapshots", right?

Ah, yes, sorry. " when libvirt doesn't implement external snapshots at all".

mvollmer avatar Apr 22 '24 07:04 mvollmer

The VM must only have disks of type "file".

Interesting -- that seems much more restrictive than even what we have now -- e.g. drives without media (empty CD-ROM) would then also not be supported?

Those are technically of type "file", so they are supported.

mvollmer avatar Apr 22 '24 07:04 mvollmer

If there's lots of pressure, I can live with showing that in the dialog, but it's still nonsense IMHO -- what should the user do?

Yeah, hmm, well. Creating internal snapshots is just fine, actually, as far as I know. It's just that the documentation has important words about them, and in my opinion, Cockpit should try hard to make consulting the documentation unnecessary, so we should show those important words in the dialog. Also we have been asked to do this.

The documentation says "they might work for you", so totally preventing people from making internal snapshots also seems wrong.

And note that this is only for RHEL.

mvollmer avatar Apr 22 '24 07:04 mvollmer

(/me mumbles ... they should've just fixed whatever is wrong with internal snapshots and not tell anyone about it.)

mvollmer avatar Apr 22 '24 07:04 mvollmer

The product documentation also has awkward instructions for checking whether Cockpit has created a external or internal snapshot, using virsh and grep on the command line. To help with this, we could add a little (i) icon somewhere in our table (on RHEL only) that explains the support status of a snapshot.

I think this is not necessary. Existing internal snapshots are just fine, we only need to worry about informing people when they create more.

mvollmer avatar Apr 23 '24 10:04 mvollmer

@garrett, the dialog now looks like this:

image

mvollmer avatar May 07 '24 07:05 mvollmer

Personally, I think this is a reasonable thing for Cockpit to do, given the current limitations of external snpahots and RHEL's unwillingness to fully support internal ones.

Alternatives:

  • Get RHEL to fully support internal snapshots. I still don't fully understand why they are deprecated. It might be that someone simply decided to stop investing in them to "simplify" but I don't think upstream libvirt will ever deprecate or remove them. In fact, upstream libvirt still defaults to internal snapshot for compatibility reasons.

  • Improve the implementation of external snapshots to also work in file-based pools. I'll see if I can find the paper work that tracks this.

  • Don't say anything in the Cockpit UI and let people find the product documentation themselves and run the command line snippets in it to figure out whether they are about to do something that is not fully supported.

  • Refuse to make internal snapshots on RHEL in Cockpit. But the error message would need to contain pretty much the same text as the proposed dialog here in order to be helpful. We can of course also refuse to be helpful and just disable the button without explanation.

The snapshot dialog will get more complicated in the future (with "disk-only" option etc.), but whatever we figure out here will be applicable then. Thus, I would like to make progress here.

mvollmer avatar May 07 '24 07:05 mvollmer

I started this thread – after trying the snapshots out and experiencing some confusion even though I was following this thread all along.

My suggestion is to add as much documentation in the interface as possible. It's a new feature and an important one.

90% of the time less is more, but not this time IMO.

mac2net avatar May 07 '24 07:05 mac2net

My suggestion is to add as much documentation in the interface as possible. It's a new feature and an important one.

Hmm, there isn't much new here, is there? Cockpit has been able to make snapshots of virtual machines since version 225, released in 2020.

What has happened is that RHEL wants to move away from one snapshot format to another. Ideally, this is a implementation detail and Cockpit will just make snapshots in the newer format, with no need to tell the user about it. And we don't want to explain in the UI the difference between "internal" and "external" snapshots, if at all possible.

But RHEL really really wants people to stop using the old format, before the new one can handle all cases that the old one could. So there are cases where we must decide whether to make a internal snapshot, or refuse to make any. (And I am still of the opinion that there is nothing bleeping wrong with the old snapshot format, so it would be a bit of a shame to prevent people from getting their job done with Cockpit.)

Fedora does not want people to stop using the old format as badly, afaik, so Cockpit will not show this message there.

mvollmer avatar May 07 '24 08:05 mvollmer

The RHEL product documentation has been published. I have updated the description of this PR with my best shot at selling it:

This is based on and meant to go together with the RHEL product documentation.

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_virtualization/creating-virtual-machine-snapshots_configuring-and-managing-virtualization

That documentation explicitly calls out internal snapshots as deprecated and strongly advises against using them in production environments.

However, it also says that "Internal snapshots might work for your use case, but Red Hat does not provide full testing and support for them. ", external snapshots don't work in all cases, and upstream libvirt does not consider them deprecated so Cockpit should probably not fully prevent their use on RHEL.

Thus, let's have a little notice in the creation dialog with a link to the product documentation.

mvollmer avatar May 13 '24 08:05 mvollmer