Add support in L1 kernel handle interrupts posted from the IOMMU
The current TDX architecture supports posted interrupts (from Host VMM and IOMMU devices) for TDX L1 VMM (VTL2). For passthrough devices owned by L2 guest (VTL0), however, Hyper-V does not use posted interrupts. Each hardware interrupt from VTL0 device results in a TDEXIT to Hyper-V today. This comes with considerable performance cost.
The VTL2 kernel dispatches interrupts directed to the VTL0 guest via the single interrupt vector of the VMBus synthetic interrupt controller. It relays the interrupt to the VTL0 guest using a bitfield in which each bit represents a VTL0 interrupt vector.
To recoup performance, Hyper-V at the host can configure the IOMMU to deliver VTL0 interrupts directly to the VTL2 guest using posted interrupts. In such setup, the VTL2 kernel will still relay interrupts to the VTL0 guest, but it will use a dedicated VTL2 interrupt vector. This dedicated vector will be mapped one interrupt vector in the VTL0 guest.
As stated in [1], the synic is not modeled in Linux as an irq_chip or irq_domain, and the demultiplexed logical interrupts are not Linux IRQs. This also implies that interrupt handlers cannot be installed using request_irq(). Instead, reserve a block of interrupt vectors in the VTL2 kernel and manage mappings using an array.
VTL2 user space is in charge of requesting VTL0 <=> VTL2 interrupt mappings. Introduce a new IOCTL to request a mapping to the VTL2 kernel. On success, the kernel will return in the payload of the IOCTL the VTL2 vector that will be used for the requested mapping. User space will use this information to undo the mapping when needed.
The Hyper-V VTL driver owns all the data structures of proxy interrupts (i.e., VTL0 interrupts that the VTL2 kernel receives and relays to the VTL0 guest). The Hyper-V driver can be compiled as a module and needs to be hardware-agnostic. There needs to be separation between the architecture-specific handling of interrupts and the hardware-agnostic handling of proxy interrupts. Implement all the necessary changes in the Hyper-V driver and provide interfaces that architectures can implement.
@ricardon is this issue still open?
Yes @cperezvargas we are still working on it. We currently have a proof-of-concept with changes in HyperV and the OHCL-Linux-Kernel. The proof-of-concept is functional and verifying performance improvements. After that, we need to cleanup the code to have it ready for integration.
Hi @chris-oo balajimc55 and were discussing what to do in case the L2 guest requests too many vectors to L1.
An option can be for the L1 kernel to return error if it runs out of vectors to remap. The error would the L2 guest to launch as errors from IOCTLs are fatal. This would be the easiest implementation.
A more complicated solution would be for user space to handle the error and fallback to the existing proxy interrupt mechanism. As per input from balajimc55, Hyper-V would need changes to propagate and handle fallback to regular proxy interrupts.
How many vectors are too many, in this case? I don't quite understand why would we need host changes to do the fallback to userspace path - isn't this the path we're doing today for all devices (proxy interrupts?).
I'd prefer if we have the fallback path. Don't we need this, incase the host doesn't support the remapping path, or is the remapping path entirely within the guest?
The experiments we conducted used 6 vectors. I was thinking on reserving 20 to err on the safe side. Also, now that I think about it, I fail to see the need to propagate errors to the host. Perhaps L1 user space can see the posted interrupt IOCTL fail and fallback to the existing posted interrupts. balajimc55 am I missing something?
I agree that we need the fallback path in case posted interrupts are not supported at all. My question was whether there is a use case for a mixture of proxy interrupts along with posted interrupt. Or should we only use posted interrupts if supported?
In my latest code, I reserve 32 CPU interrupt vectors. All CPUs have the same interrupt descriptor table. User space uses a IOCTL to request an interrupt to be remapped. The kernel will remember the mapping and raise the corresponding proxy interrupt when a CPU gets an interrupt in any of the reserved vectors. It will issue a warning if a CPU interrupt that has not been mapped is raised.
If the kernel runs out of interrupts to map it will return an -ENODEV error. At such point user space needs to decide how to proceed.
Working on getting ready the code submit a pull request. I plan to do it by May 23rd.
Submittted a pull-request for this feature: https://github.com/microsoft/OHCL-Linux-Kernel/pull/80
Ricardo received code review comments from MSFT. Plan to send updated PR today
Indeed, @romank-msft provided comments, which I am addressing. I am also improving locking, since the proxy interrupt data structures are most read. Once complete, I will send an updated pull request.
I updated my pull request to incorpoporate feedback from @romank-msft and improved locking. Details in the same pull request as before (https://github.com/microsoft/OHCL-Linux-Kernel/pull/80)
I have updated my pull request for this feature. It now does not touch the arch/x86 directory and instead relies on the existing Linux IRQ infrastructure.
The pull request to support this feature has been merged in the OHCL kernel repository (https://github.com/microsoft/OHCL-Linux-Kernel/pull/80). This issue can be closed.