ovirt-web-ui icon indicating copy to clipboard operation
ovirt-web-ui copied to clipboard

WIndows 11/2022 VMs for cluster level >= 4.6 - failed to create/edit such VMs due to the TPM device

Open sgratch opened this issue 2 years ago • 21 comments

Since TPM is not set on backend automatically based on OS type and Cluster level version, the following 2 scenarios occur:

  1. for cluster level >= 4.6: when trying to create a new VM with OS set to Windows 11 or 2022 or when editing an existing VM and set OS to Windows 11 or 2022, the creation/edit failed in rest backend with the following error: TPM device is required by the guest OS

    E.g. image image

  1. When trying to edit a VM with OS Windows 11 or 2022 and cluster level >= 4.6 by changing the OS to non supported TPM one, e.g. Linux, the following error appears:

    image

Details:

On webadmin the TPM is enabled/disabled on frontend based on OS and cluster level: https://github.com/oVirt/ovirt-engine/blob/91bc0b8f9e4d0cbfa8799f860a008948f3e6ed0a/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/vms/UnitVmModel.java#L4047

So for fixing we should either enable/disbale the TPM on backend based on OS or the same logic as on webadmin frontend should be used by web-ui as well.

sgratch avatar May 30 '22 16:05 sgratch

Once again we see a case in which we need to compensate the lack of functionality (in this case of defining TPM) in the vm portal but this time I don't think the right place is the backend side because, while it's ok to add TPM when changing the OS to one that requires TPM, it's not ok to drop the TPM data without some kind of warning when changing the OS to one that doesn't support TPM - so for "ordinary" clients I think we should still reject such input and require them to explicitly specify whether TPM should be enabled or disabled. When it comes to the vm portal, I wouldn't necessarily do it exactly as the webadmin does, but set TPM=disabled when the OS doesn't support TPM and TPM=enabled when it requires it (and unlike the webadmin, always set TPM=disabled when the OS supports TPM but doesn't require it) - if you want to handle both cases above (in my opinion the second flow is more theoretical than a practical one but I don't want to debate over that again :) )

@mz-pdm what do you think?

ahadas avatar May 31 '22 06:05 ahadas

It sounds reasonable to enable TPM here only when required by the guest OS, since TPM presence may imply further limitations (e.g. no snapshots with memory, if applicable here) or expectations (e.g. security). But what to do about TPM data deletion in the more theoretical scenario of the guest OS change that disables TPM?

mz-pdm avatar May 31 '22 10:05 mz-pdm

@ahadas @mz-pdm

Sounds ok to silently enable TPM only when required by the guest OS, i.,e. currently for Win 2022 & Win 11. Since this info is not reachable via rest and since web-ui is not reading osinfo-defaults.properties then it should be hardcoded. Ugly but possible.

Our basic guidelines are 1) if a VM created via webadmin with defaults settings is identical to the one created via the VM portal for TPM - and the answer is yes. 2) do we want the TPM settings to be exposable for non admin users? None of the resource allocation settings are exposable via the vm portal since it's more an admin decision, so the same goes here.

But what to do about TPM data deletion in the more theoretical scenario of the guest OS change that disables TPM?

We can warn the user and block it in that case but I prefer allowing it by silently disabling TPM for non required OS. We are already doing that for other use cases as well, i.e. changing cluster for an existing VM might reset assigned hosts and disable cpu pass-through without warning the user. If a user changes his VM's OS or cluster, he should be aware that it might have consequences... we can add a tololtip with a general warning.

sgratch avatar May 31 '22 14:05 sgratch

If a user changes his VM's OS or cluster, he should be aware that it might have consequences...

It's important to understand that if the user changes the TPM-enabled OS by mistake to another OS and then returns it back then the consequence may be that the guest OS won't boot anymore anywhere and must be reinstalled or some recovery procedure applied. Which is probably more serious than the other use cases with changed configuration. I'm not sure whether Windows stores any boot-critical data into TPM by default -- there is some info at https://docs.microsoft.com/en-us/windows/security/information-protection/tpm/how-windows-uses-the-tpm and around; probably not but users may use any of the functionalities mentioned there, e.g. BitLocker, and then it would be a real problem.

I'd say that displaying a confirmation dialog with a warning before committing such a guest OS change, similarly to what we do in webadmin when TPM is being disabled, would be a good and good enough protection.

mz-pdm avatar May 31 '22 15:05 mz-pdm

@mz-pdm

I'd say that displaying a confirmation dialog with a warning before committing such a guest OS change, similarly to what we do in webadmin when TPM is being disabled, would be a good and good enough protection.

which warning do you refer to on webadmin? If I edit an existing VM with TPM-required=true OS (e.g. Windows 11) and I change the OS to a TPM-required=false (e.g. Linux) then no warning appears on webadmin while the TPM is disabled.

So the logic on VM portal should include the following: For new vms: Silently enable TPM for TPM-required OS only.

For editing of existing vms:

  1. if the vm with TPM-enabled/required OS is changed to a TPM-enabled=false OS then a confirmation warning should appear and the TPM should be disabled
  2. if the vm with TPM-required=false OS is changed to TPM-required=true OS then silently enable the TPM
  3. leave TMP settings as is for all other cases

Did I miss a use case?

sgratch avatar Jun 14 '22 12:06 sgratch

which warning do you refer to on webadmin? If I edit an existing VM with TPM-required=true OS (e.g. Windows 11) and I change the OS to a TPM-required=false (e.g. Linux) then no warning appears on webadmin while the TPM is disabled.

The warning is not displayed if there is no stored TPM data yet. After enabling TPM, run the VM for a couple of minutes (no guest OS needed) then stop it and try to change the OS. You should get a dialog saying "TPM was disabled and the current TPM data will be irrecoverably deleted. If you want to keep the data, cancel this dialog and enable TPM again before confirming your changes.".

So the logic on VM portal should include the following: For new vms: Silently enable TPM for TPM-required OS only.

Yes.

For editing of existing vms:

  1. if the vm with TPM-enabled/required OS is changed to a TPM-enabled=false OS then a confirmation warning should appear and the TPM should be disabled

Yes. Technically, it's not needed if there is no TPM data stored yet, but it's OK to always show the dialog in such a case.

  1. if the vm with TPM-required=false OS is changed to TPM-required=true OS then silently enable the TPM

Yes.

  1. leave TMP settings as is for all other cases

Yes.

Did I miss a use case?

I can't think of any.

mz-pdm avatar Jun 14 '22 13:06 mz-pdm

@ahadas @mz-pdm After the missing tmp_enabled flag is sent to the server we hit a different problem. The error message is:

[Cannot edit VM. TPM device is not supported on 'x86_64' architecture, or for the guest OS, or UEFI firmware is not enabled, or insufficient compatibility level (4.6 required by default).]"

The cause in the case above seems to be UEFI. Web Admin forces UEFI even if it doesn't match the cluster defaults or if user explicitly has picked different option (both during create and edit). Interestingly, the chipset is changed silently also if user changes OS to one that requires TPM. This seems risky to me and not something that a regular user should be able to do in VM Portal. Can we treat such advanced TPM-related errors as non-recoverable? i.e. in the UEFI case the admin would need to change the default chipset in the cluster to unblock regular user.

rszwajko avatar Jul 06 '22 14:07 rszwajko

@rszwajko 'uefi' is the default firmware in cluster level 4.7 but that default should only apply to new VMs that are created from scratch (from the blank template), it shouldn't affect VMs that are created from other templates or VMs that are being edited. can you please provide more details on the changes you did to the VM that resulted in that error?

ahadas avatar Jul 06 '22 19:07 ahadas

@ahadas The problem occurs both when trying to create a new VM and when editing and existing one. In order to re-produce the problem with VM Portal you need the patches to restapi but you can recreate it also using REST (see below). Problem solved after changing (via Web Admin) the chipset to Q35 Chipset with UEFI

New VM scenario in VM Portal:

  1. in VM Portal enter Create Virtual Machine wizard
  2. select cluster with following settings:
    1. Cluster compatibility level: 4.6
    2. Chipset type: Q35 Chipset with BIOS
  3. select template: Blank (empty template)
  4. select OS that requires TPM i.e. Windows 2022
  5. go to Summary step and create the VM - the request will fail with the response below.

REST:

curl -v \                                                              
 --request POST \                                                       
 --header 'Version: 4' \                                                
 --header 'Accept: application/json' --header 'Content-Type: application/json' \
 --user "admin@internal:password" --data '{"cluster":{"id":"eeba21cc-1178-11ec-b175-52540012eda2"},"cpu":{"topology":{"cores":1,"sockets":1,"threads":1}},"memory_policy":{"max":4294967296,"guaranteed":1073741824},"memory":1073741824,"name":"tpm","quota":{},"os":{"type":"windows_2022"},"type":"server","time_zone":{"name":"GMT Standard Time"},"tpm_enabled":true,"initialization":{},"template":{"id":"00000000-0000-0000-0000-000000000000"}}' \
  http://server/ovirt-engine/api/vms?clone=true&clone_permissions=true

Response (400 - Bad Request):

{
  "detail" : "[Cannot add VM. TPM device is not supported on 'x86_64' architecture, or for the guest OS, or UEFI firmware is not enabled, or insufficient compatibility level (4.6 required by default).]",
  "reason" : "Operation Failed"
}

rszwajko avatar Jul 07 '22 09:07 rszwajko

@ahadas The problem occurs both when trying to create a new VM and when editing and existing one. In order to re-produce the problem with VM Portal you need the patches to restapi but you can recreate it also using REST (see below). Problem solved after changing (via Web Admin) the chipset to Q35 Chipset with UEFI

New VM scenario in VM Portal:

  1. in VM Portal enter Create Virtual Machine wizard

  2. select cluster with following settings:

    1. Cluster compatibility level: 4.6
    2. Chipset type: Q35 Chipset with BIOS

Ah ok, yeah, the default changed to UEFI in cluster level 4.7 and you do it with 4.6 cluster level In that case, or when the cluster is changed to BIOS, the caller needs to override the firmware when attempting to provision the VM with win 11/2022 to prevent that error

ahadas avatar Jul 07 '22 10:07 ahadas

@ahadas

In that case, or when the cluster is changed to BIOS, the caller needs to override the firmware when attempting to provision the VM with win 11/2022 to prevent that error

That's fine for admin user. Can we treat such advanced TPM-related errors as non-recoverable for a regular user?

rszwajko avatar Jul 07 '22 11:07 rszwajko

I wouldn't limit it only to admin users - I think the VM portal should specify UEFI for VMs that are set with Win 11/2022 also when they are created by non-admin users

ahadas avatar Jul 07 '22 11:07 ahadas

@ahadas PR https://github.com/oVirt/ovirt-web-ui/pull/1610 takes care of happy path - if cluster is configured with UEFI as the default.

The corner cases (i.e. discussed above) are not easy to handle automatically. UI would need to follow backend logic defined here. In order to repeat the same tests more information need to be exposed via REST i.e. isOvmf flag.
Alternatively we could expose the required properties: tmp_enabled and bios_type as advanced parameters. If the user will encounter problems then error message and documentation should be enough to suggest the right path. Both cases will require another PR.

rszwajko avatar Jul 07 '22 15:07 rszwajko

@rszwajko you're right but we don't really need the solution to be that generic. The VM portal can always set UEFI for Windows 11/2022 - otherwise it wouldn't work. That logic on the backend is there to covers all flows, like adding a RHEL VM with TPM on different cluster levels (which is not possible using the VM portal). I'd rather check with OsRepository if the OS requires TPM and ask for UEFI in that case, regardless of the cluster level and other considerations

ahadas avatar Jul 07 '22 15:07 ahadas

I'd rather check with OsRepository if the OS requires TPM and ask for TPM in that case, regardless of the cluster level and other considerations

yep. This is PR #1610 :)

rszwajko avatar Jul 07 '22 15:07 rszwajko

sorry, not TPM.. let me rephrase the last part: I'd rather check with OsRepository if the OS requires TPM and ask for UEFI in that case, regardless of the cluster level and other considerations

ahadas avatar Jul 07 '22 15:07 ahadas

@rszwajko @ahadas @mz-pdm

The problem occurs both when trying to create a new VM and when editing and existing one. In order to re-produce the problem with VM Portal you need the patches to restapi but you can recreate it also using REST (see below). Problem solved after changing (via Web Admin) the chipset to Q35 Chipset with UEFI

New VM scenario in VM Portal:

  1. in VM Portal enter Create Virtual Machine wizard

  2. select cluster with following settings:

    1. Cluster compatibility level: 4.6
    2. Chipset type: Q35 Chipset with BIOS

Ah ok, yeah, the default changed to UEFI in cluster level 4.7 and you do it with 4.6 cluster level

For new VMs and for edit VMs while vm.tpm_enabled = false is changed to TPM-required=true OS: AFAIK, this mentioned behavior above is reproduced with cluster level 4.6 as well. So current behavior on webadmin is that when vm.cluster level >= 4.6 and vm.TPM-required OS (win 11/2022) and vm.Chipset type <> UEFI, the Chipset type is changed automatically without notifying the user to UEFI, regardless to template/cluster/user settings.

So I agree with Arik that this should be the same behavior on VM portal.

@ahadas @mz-pdm But instead of setting it via rest/vm portal as well, why can't it be the default behavior on backend? The user can't anyway change it or override it successfully so what's the point of setting it via rest? I.e. why not moving this logic https://github.com/oVirt/ovirt-engine/blob/af4ac851f145175eacb8b6d12fe9231381bcb810/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/vms/UnitVmModel.java#L4039-L4042 to backend?

For edit VMs:

  1. If the vm with tpm_enabled = true is changed to a TPM-unsupported-OS=true then a confirmation warning (something like: "The TPM device is not supported by the chosen operating system. Disabling the device will erase all stored data. If you want to keep the TPM data, don't change the OS before confirming your changes") should appear and the TPM should be disabled.
  2. leave TMP settings as is for all other cases

Supporting Chipset/Firmware Type as part of the VM portal This can be considered as a new feature for the VM portal- new/edit VMs, since I agree that it might be exposed to non admins as well, but regardless to this TPM device issue.

sgratch avatar Jul 10 '22 18:07 sgratch

@ahadas @mz-pdm But instead of setting it via rest/vm portal as well, why can't it be the default behavior on backend? The user can't anyway change it or override it successfully so what's the point of setting it via rest? I.e. why not moving this logic https://github.com/oVirt/ovirt-engine/blob/af4ac851f145175eacb8b6d12fe9231381bcb810/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/vms/UnitVmModel.java#L4039-L4042 to backend?

in general when creating a vm using rest-api and specifying some settings, the current expectation is that the vm will inherit the other settings from the template (and in the past from instance types). it was easier to cope with that standard and instead of changing the chipset and firmware according to the specified operating system, do the same and validate that we end up with a valid combination

specifically in this case, it also didn't make sense to change the firmware to UEFI "automatically" behind the scenes because UEFI was tech preview before 4.7 and in 4.7 it is the default firmware

For edit VMs:

  1. If the vm with tpm_enabled = true is changed to a TPM-unsupported-OS=true then a confirmation warning (something like: "The TPM device is not supported by the chosen operating system. Disabling the device will erase all stored data. If you want to keep the TPM data, don't change the OS before confirming your changes") should appear and the TPM should be disabled.
  2. leave TMP settings as is for all other cases

Supporting Chipset/Firmware Type as part of the VM portal This can be considered as a new feature for the VM portal- new/edit VMs, since I agree that it might be exposed to non admins as well, but regardless to this TPM device issue.

I wouldn't do that at this point - the VM portal lacks so many features for the sake of simplicity (like high availability) that it wouldn't make much sense to add chipset and firmware in my opinion. users should really use the default one (from the template or from the cluster when creating vms from scrach) unless they must switch, and the only case would be those new windows versions

ahadas avatar Jul 13 '22 15:07 ahadas

in general when creating a vm using rest-api and specifying some settings, the current expectation is that the vm will inherit the other settings from the template (and in the past from instance types). it was easier to cope with that standard and instead of changing the chipset and firmware according to the specified operating system, do the same and validate that we end up with a valid combination

As long as the user can create a VM based on a Blank template or based on a template while overriding/updating few settings like OS, then we need to cope with those use cases on VM portal as well.

On webadmin/frontend you do enforce setting firmware to UEFI for cluster level > 4.5 here https://github.com/oVirt/ovirt-engine/blob/af4ac851f145175eacb8b6d12fe9231381bcb810/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/vms/UnitVmModel.java#L4039-L4042 for required OS, so we'll need to do the same on VM portal for those specific cases when UEFI should be set and it's not (CL 4.6 or editing an existing VM without UEFI set). This can be done on a follow up PR.

I wouldn't do that at this point - the VM portal lacks so many features for the sake of simplicity (like high availability) that it wouldn't make much sense to add chipset and firmware in my opinion. users should really use the default one (from the template or from the cluster when creating vms from scrach) unless they must switch, and the only case would be those new windows versions

+1, agree

sgratch avatar Jul 22 '22 17:07 sgratch

in general when creating a vm using rest-api and specifying some settings, the current expectation is that the vm will inherit the other settings from the template (and in the past from instance types). it was easier to cope with that standard and instead of changing the chipset and firmware according to the specified operating system, do the same and validate that we end up with a valid combination

As long as the user can create a VM based on a Blank template or based on a template while overriding/updating few settings like OS, then we need to cope with those use cases on VM portal as well.

right, the text above was in response to But instead of setting it via rest/vm portal as well, why can't it be the default behavior on backend?. the vm portal definitely needs to have a similar logic to that in the webadmin when it comes to those windows guests

ahadas avatar Jul 22 '22 19:07 ahadas

The fix for basic flow (CL is 4.7 and firmware is set to UEFI) is merged: https://github.com/oVirt/ovirt-web-ui/pull/1610

Now need to handle left cases when vm.cluster level >= 4.6 and vm.TPM-required OS (win 11/2022) and vm.Chipset type <> UEFI. In those cases the Chipset type should be changed automatically without notifying the user to UEFI. same as supported on webadmin.

sgratch avatar Sep 30 '22 11:09 sgratch