libedgetpu icon indicating copy to clipboard operation
libedgetpu copied to clipboard

M.2 TPU device violates PCI specification

Open lamw opened this issue 1 year ago • 23 comments

Description

Customers that attempt to passthrough the M.2 TPU to a Virtual Machine using VMware ESXi Hypervisor have found that the Apex driver fails to initialize.

# dmesg
<snip>
[    3.780139] apex 0000:02:03.0: enabling device (0000 -> 0002)
[    3.785860] apex 0000:02:03.0: Page table init timed out
[    3.786103] apex 0000:02:03.0: MSI-X table init timed out

Upon initial investigation from VMware Engineering, the following was concluded:

Unfortunately the device in question violates PCI specification by mapping PBA, MSI-X vector table, and other registers into same 4KB page (PBA is at 0x46068, VT at 0x46800, but there is a bunch of other registers in 0x46XXX range). PCIe spec 6.0, page 1020, has this to say:

<quote>
If a Base Address Register or entry in the Enhanced Allocation capability that maps address space for the MSI-X Table or
MSI-X PBA also maps other usable address space that is not associated with MSI-X structures, locations (e.g., for CSRs)
used in the other address space must not share any naturally aligned 4-KB address range with one where either MSI-X
structure resides. This allows system software where applicable to use different processor attributes for MSI-X structures
and the other address space. (Some processor architectures do not support having different processor attributes
associated with the same naturally aligned 4-KB physical address range.) The MSI-X Table and MSI-X PBA are permitted
to co-reside within a naturally aligned 4-KB address range, though they must not overlap with each other.
</quote>

So having CSR registers in same page as MSI-X VT page violates the spec, and under ESXi CSR registers become unreachable (writes ignored, reads return zeroes). Due to this device driver cannot correctly initialize device.

If firmware can modify device's behavior so that VT/PBA arrays do not share same 4KB page with other registers, device will work with ESXi's passthrough. Or if firmware can hide MSI-X capability from PCI configuration space, that would fix issue as well.

I'm not sure if this has already been reported but if Google/Coral can either fix the behavior of the device to conform to the PCI specification OR hide MSI-X capability, then successful passthrough of the M.2 TPU should function correctly when using ESXi, which is a popular Hypervisor platform for development purpose

Click to expand!

Issue Type

Build/Install

Operating System

Ubuntu

Coral Device

M.2 Accelerator A+E

Other Devices

No response

Programming Language

No response

Relevant Log Output

No response

lamw avatar Aug 18 '23 22:08 lamw

Yes, please do look into addressing this!

goldserve avatar Oct 20 '23 15:10 goldserve

Very interested to have this fixed as well. Looks like Xen could have the same issue: https://xcp-ng.org/forum/topic/6304/google-coral-tpu-pcie-passthrough-woes/20

ManuelPerrot avatar Oct 20 '23 15:10 ManuelPerrot

Adding another vote to fix this here!! There are a ton of threads/requests for this but they're all over.

https://github.com/google-coral/edgetpu/issues/343

https://github.com/google-coral/edgetpu/issues/729

https://github.com/blakeblackshear/frigate/issues/6331

https://github.com/blakeblackshear/frigate/issues/94

https://github.com/blakeblackshear/frigate/issues/305

k1n6b0b avatar Oct 20 '23 16:10 k1n6b0b

+1 for a fix

grembling22 avatar Oct 31 '23 21:10 grembling22

+1

c-po avatar Nov 05 '23 20:11 c-po

+1 for a fix not only m.2 but mini pcie as well

tbozik avatar Nov 06 '23 20:11 tbozik

+1 fix please.

kentkravitz avatar Nov 11 '23 00:11 kentkravitz

+1 for fix, commenting to follow. Note this also affects the Mini-PCIe model (as expected)

TokugawaHeavyIndustries avatar Nov 13 '23 18:11 TokugawaHeavyIndustries

+1

syncnj avatar Nov 13 '23 21:11 syncnj

Can anyone think of any other possible workarounds for this problem? Seems like ESXi could also use a quirks mode for pci-e cards that need some tweaking.

kentkravitz avatar Nov 15 '23 14:11 kentkravitz

+1 for a fix please

kuantek avatar Nov 19 '23 01:11 kuantek

+1 for a fix please

Brandon314 avatar Nov 24 '23 02:11 Brandon314

+1 for a fix please

gknepper avatar Nov 25 '23 06:11 gknepper

+1 for the fix

vobelic avatar Jan 06 '24 21:01 vobelic

+1

fama-lama avatar Jan 11 '24 08:01 fama-lama

Just try to disable the msi bus for the bridge if possible, echo 1 > /sys/bus/pci/devices/$bridge/msi_bus as a temporary fix. For me it looks like there is a lot of hacky stuff in the kernel driver: https://github.com/google/gasket-driver/blob/09385d485812088e04a98a6e1227bf92663e0b59/src/gasket_interrupt.c#L245

zaolin avatar Jan 19 '24 12:01 zaolin

+1 vote for fix!

bridge-four avatar Jan 22 '24 01:01 bridge-four

+1 vote for fix!

alexsahka avatar Jan 28 '24 08:01 alexsahka

+1 :-(

Claudio1L avatar Feb 10 '24 21:02 Claudio1L

This is not likely to ever get fixed now with broadcom deprecating free ESXi. Aware this is a TPU issue but the ESXi userbase is just going to keep shrinking at this point.

thefl0yd avatar Feb 10 '24 22:02 thefl0yd

@thefl0yd I do not believe this is the case. I have a need to deploy the m.2 in multiple enterprise VMware deployments via passthru.

+1 For a fix

Sanman96 avatar Mar 13 '24 16:03 Sanman96