kubevirt icon indicating copy to clipboard operation
kubevirt copied to clipboard

[Placeholder] Intel TDX enablement

Open LeiZhou-97 opened this issue 2 years ago • 46 comments

This is just a placeholder issue for tracking the development progress of Intel TDX enaling in KubeVirt.

Currently, due to the Intel TDX kernel, qemu and libvirt upstream tasks are working in progress, we create a intel/kubevirt-tdx repo for development, which gives Intel TDX target customers a way to build the Intel TDX workable KubeVirt images. But building these images depends on the Linux Software Stack for Intel TDX, you can get it from tdx-tools.

Our intel/kubevirt-tdx is based on the CentOS Stream 8 code base. The Linux Software Stack for Intel TDX does not support CentOS Stream 9 (KubeVirt base image) yet. We will provide support later.

Implementation

  • [x] Basic Intel TDX functional support

  • need rebase to the main branch after supporting CentOS Stream 9

  • [x] Support Intel TDX Attestation

  • [ ] Support Live Migration

  • [x] Expose Key number of Intel TDX for TD guest management

  • [ ] Support E2E Attestation use case FDE (Full Disk Encryption)

Testing

  • [ ] Add upstream CI for running functional tests with attestation
  • [ ] Create the test image

Documentation

  • [ ] Initial design proposal
  • [ ] Initial usage guide

LeiZhou-97 avatar Apr 28 '23 03:04 LeiZhou-97

/cc

alicefr avatar Apr 28 '23 07:04 alicefr

@LeiZhou-97 do you plan on the long term to integrate the changes into kubevirt main repo? If so, please consider in your plan the upstream CI. We added a SEV AMD machine and there are still pr ongoing for running functional tests with attestation. It would be possible to have a similar approach for TDX.

alicefr avatar Apr 28 '23 08:04 alicefr

@alicefr Thanks for your suggestion. I'll add it to my task list.

LeiZhou-97 avatar Apr 28 '23 08:04 LeiZhou-97

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot avatar Sep 04 '23 07:09 kubevirt-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot avatar Oct 04 '23 07:10 kubevirt-bot

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot avatar Nov 03 '23 08:11 kubevirt-bot

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot avatar Nov 03 '23 08:11 kubevirt-bot

/reopen

LeiZhou-97 avatar Dec 04 '23 07:12 LeiZhou-97

@LeiZhou-97: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot avatar Dec 04 '23 07:12 kubevirt-bot

Hi, @alicefr I am sorry for taking so long to continue to have some new updates about Intel TDX!

Since Intel TDX is a very complex technology, upstream is still WIP, but in order to make TDX available to end-users earlier, there is already a TDX 1.0 repo in the CentOS Stream repositories. https://linux-mirrors.fnal.gov/linux/centos-stream/SIGs/9-stream/virt/x86_64/tdx-devel/

So I would like to ask if the code about TDX VM enabling can be upstreamed based on a TDX mid-stream PKGs? This may require the user to build the image themselves or we can provide pre-built TDX-enabled images at each release directly. (because the TDX stacks do not use the standard qemu/libvirt) Or can I keep a TDX branch in the kubevirt repo for end-users to use?

Now, we’ve finished the basic TDX enabling based on the v1.0.0 release in the intel org repo. https://github.com/intel/kubevirt-tdx/tree/tdx-1.5 https://github.com/intel/kubevirt-tdx/wiki

Appreciate for your comments!

LeiZhou-97 avatar Dec 04 '23 07:12 LeiZhou-97

Hi, @alicefr I am sorry for taking so long to continue to have some new updates about Intel TDX!

Since Intel TDX is a very complex technology, upstream is still WIP, but in order to make TDX available to end-users earlier, there is already a TDX 1.0 repo in the CentOS Stream repositories. https://linux-mirrors.fnal.gov/linux/centos-stream/SIGs/9-stream/virt/x86_64/tdx-devel/

So I would like to ask if the code about TDX VM enabling can be upstreamed based on a TDX mid-stream PKGs? This may require the user to build the image themselves or we can provide pre-built TDX-enabled images at each release directly. (because the TDX stacks do not use the standard qemu/libvirt) Or can I keep a TDX branch in the kubevirt repo for end-users to use?

Now, we’ve finished the basic TDX enabling based on the v1.0.0 release in the intel org repo. https://github.com/intel/kubevirt-tdx/tree/tdx-1.5 https://github.com/intel/kubevirt-tdx/wiki

Appreciate for your comments!

@LeiZhou-97 please send an email to kubevirt mailing list, this is something that needs to be discussed and agreed by the KubeVirt community and approvers.

However, I find it difficult to have 2 libvirt and QEMU versions. We cannot test if something breaks on TDX if the images aren't the same as upstream. From my side, I think getting the changes upstreamed in libvirt and QEMU is a must before starting the work in KubeVirt.

alicefr avatar Dec 04 '23 12:12 alicefr

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot avatar Jan 03 '24 12:01 kubevirt-bot

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kubevirt-bot avatar Jan 03 '24 12:01 kubevirt-bot

/reopen /cc @fidencio /cc @mythi

victortoso avatar Oct 02 '24 10:10 victortoso

@victortoso: Reopened this issue.

In response to this:

/reopen /cc @fidencio /cc @mythi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubevirt-bot avatar Oct 02 '24 10:10 kubevirt-bot

/cc

iholder101 avatar Oct 02 '24 10:10 iholder101

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot avatar Nov 01 '24 11:11 kubevirt-bot

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kubevirt-bot avatar Nov 01 '24 11:11 kubevirt-bot

/reopen

vladikr avatar Apr 14 '25 17:04 vladikr

@MatiasVara would you like to be assigned to this? @alicefr fyi.

vladikr avatar Apr 14 '25 17:04 vladikr

@MatiasVara would you like to be assigned to this? @alicefr fyi.

yes, you can.

MatiasVara avatar Apr 15 '25 07:04 MatiasVara

@MatiasVara would you like to be assigned to this? @alicefr fyi.

yes, you can.

@MatiasVara are you planning to work on a PR for this?

mythi avatar Apr 21 '25 10:04 mythi

@MatiasVara would you like to be assigned to this? @alicefr fyi.

yes, you can.

@MatiasVara are you planning to work on a PR for this?

Hello, yes, I am. I am currently working on a PoC but I do not have it working yet.

MatiasVara avatar Apr 21 '25 17:04 MatiasVara

@MatiasVara great! I can help with any TDX specific questions you may have. Also feel free to tag me in your PR.

mythi avatar Apr 22 '25 10:04 mythi

@MatiasVara great! I can help with any TDX specific questions you may have. Also feel free to tag me in your PR.

Cool! I am currently struggling with the attestation. I setup the virt-launcher with the right QEMU and I have included the qsg service. Also, I setup a device plugin to expose /dev/sgx-*. However, I am getting the following issue:

[QCNL] Encountered CURL error: (77) Problem with the SSL CA cert (path? access rights?) 

[QPL] Failed to get quote config. Error code is 0xb003

[get_platform_quote_cert_data ../td_ql_logic.cpp:302] Error returned from the p_sgx_get_quote_config API. 0xe019
tee_att_init_quote return 0x11001
tee_att_get_quote_size return 0x1100f
resp_size is 0
About to shutdown and close socket
erased a connection, now [0]

I think something else is missing in the virt-launcher pod. That made me think that qgs shall be in a different container and only share the unix socket with the virt-launcher although I do not how to do it yet.

MatiasVara avatar Apr 22 '25 10:04 MatiasVara

That made me think that qgs shall be in a different container and only share the unix socket with the virt-launcher although I do not how to do it yet.

Do you have QGS running in a container? You will also need to configure it to point to a proper PCCS service in our cluster / infra.

mythi avatar Apr 22 '25 10:04 mythi

That made me think that qgs shall be in a different container and only share the unix socket with the virt-launcher although I do not how to do it yet.

Do you have QGS running in a container? You will also need to configure it to point to a proper PCCS service in our cluster / infra.

I think it is failing before contacting any PCCS service. I think some credential is missing since the virt-launcher is very restricted and is not meant to run q binary like qgs there.

MatiasVara avatar Apr 23 '25 11:04 MatiasVara

I think some credential is missing since the virt-launcher is very restricted and is not meant to run q binary like qgs there.

Running QGS in a dedicated DaemonSet with UDS (port=0 config/param) made available via a hostPath mount should be enough and it does not need any extra privileges (other than getting /dev/sgx_* resources from the device plugin). You might have to toggle "use_secure_cert" to false in your sgx_default_qcnl.conf

mythi avatar Apr 23 '25 12:04 mythi

@mythi do you know why the certificates is needed? Do you suggest disabling it now for development but we will need to properly setup it later on?

victortoso avatar Apr 23 '25 12:04 victortoso

@mythi do you know why the certificates is needed? Do you suggest disabling it now for development but we will need to properly setup it later on?

It's like any web service with properly signed TLS certificates. QGS talks to PCCS (collateral caching service) REST to get the quote signing collateral but PCCS is typically infra specific thing and something KubeVirt implementation itself does not need to worry about (KubeVirt can assume a properly configured QGS is running on each node). For testing and development (and since you're likely running PCCS of your own), disabling should be fine.

mythi avatar Apr 23 '25 13:04 mythi