libedgetpu
libedgetpu copied to clipboard
M.2 TPU device violates PCI specification
Description
Customers that attempt to passthrough the M.2 TPU to a Virtual Machine using VMware ESXi Hypervisor have found that the Apex driver fails to initialize.
# dmesg
<snip>
[ 3.780139] apex 0000:02:03.0: enabling device (0000 -> 0002)
[ 3.785860] apex 0000:02:03.0: Page table init timed out
[ 3.786103] apex 0000:02:03.0: MSI-X table init timed out
Upon initial investigation from VMware Engineering, the following was concluded:
Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range). PCIe spec 6.0, page 1020, has this to say:
<quote>
If a Base Address Register or entry in the Enhanced Allocation capability that maps address space for the MSI-X Table or
MSI-X PBA also maps other usable address space that is not associated with MSI-X structures, locations (e.g., for CSRs)
used in the other address space must not share any naturally aligned 4-KB address range with one where either MSI-X
structure resides. This allows system software where applicable to use different processor attributes for MSI-X structures
and the other address space. (Some processor architectures do not support having different processor attributes
associated with the same naturally aligned 4-KB physical address range.) The MSI-X Table and MSI-X PBA are permitted
to co-reside within a naturally aligned 4-KB address range, though they must not overlap with each other.
</quote>
So having CSR registers in same page as MSI-X VT page violates the spec, and under ESXi CSR registers become unreachable (writes ignored, reads return zeroes). Due to this device driver cannot correctly initialize device.
If firmware can modify device's behavior so that VT/PBA arrays do not share same 4KB page with other registers, device will work with ESXi's passthrough. Or if firmware can hide MSI-X capability from PCI configuration space, that would fix issue as well.
I'm not sure if this has already been reported but if Google/Coral can either fix the behavior of the device to conform to the PCI specification OR hide MSI-X capability, then successful passthrough of the M.2 TPU should function correctly when using ESXi, which is a popular Hypervisor platform for development purpose
Click to expand!
Issue Type
Build/Install
Operating System
Ubuntu
Coral Device
M.2 Accelerator A+E
Other Devices
No response
Programming Language
No response
Relevant Log Output
No response
Yes, please do look into addressing this!
Very interested to have this fixed as well. Looks like Xen could have the same issue: https://xcp-ng.org/forum/topic/6304/google-coral-tpu-pcie-passthrough-woes/20
Adding another vote to fix this here!! There are a ton of threads/requests for this but they're all over.
https://github.com/google-coral/edgetpu/issues/343
https://github.com/google-coral/edgetpu/issues/729
https://github.com/blakeblackshear/frigate/issues/6331
https://github.com/blakeblackshear/frigate/issues/94
https://github.com/blakeblackshear/frigate/issues/305
+1 for a fix
+1
+1 for a fix not only m.2 but mini pcie as well
+1 fix please.
+1 for fix, commenting to follow. Note this also affects the Mini-PCIe model (as expected)
+1
Can anyone think of any other possible workarounds for this problem? Seems like ESXi could also use a quirks mode for pci-e cards that need some tweaking.
+1 for a fix please
+1 for a fix please
+1 for a fix please
+1 for the fix
+1
Just try to disable the msi bus for the bridge if possible, echo 1 > /sys/bus/pci/devices/$bridge/msi_bus as a temporary fix. For me it looks like there is a lot of hacky stuff in the kernel driver: https://github.com/google/gasket-driver/blob/09385d485812088e04a98a6e1227bf92663e0b59/src/gasket_interrupt.c#L245
+1 vote for fix!
+1 vote for fix!
+1 :-(
This is not likely to ever get fixed now with broadcom deprecating free ESXi. Aware this is a TPU issue but the ESXi userbase is just going to keep shrinking at this point.
@thefl0yd I do not believe this is the case. I have a need to deploy the m.2 in multiple enterprise VMware deployments via passthru.
+1 For a fix