edgetpu icon indicating copy to clipboard operation
edgetpu copied to clipboard

Installation failing on Raspberry Pi CM4 for PCI-E driver

Open timonsku opened this issue 4 years ago • 93 comments

Following the installation guide for the M.2 I get several compilation errors when its trying to install gasket. Here the log of the make process: gasket-make.log

It seems its mostly the 3 same errors invalid use of undefined type ‘struct msix_entry’’ implicit declaration of function ‘writeq_relaxed’; did you mean ‘writel_relaxed’ implicit declaration of function ‘readq_relaxed’; did you mean ‘readw_relaxed’ implicit declaration of function ‘pci_disable_msix’; did you mean ‘pci_disable_sriov’

This is using gcc version 8.3.0 using the latest Raspbian with Kernel 5.4.51-v7l+ Unsure whether this is compiler, kernel header or code issues.

timonsku avatar Dec 13 '20 00:12 timonsku

Hello @timonsku we have investigated the CM4 previously and unfortunately, we determined that it won't works with our PCIe modules as the CPU doesn't have MSI-X supports as required by our requirements.

Namburger avatar Dec 15 '20 01:12 Namburger

Hey Namburger, the pi engineers have worked on this and have added support for MSI-X in the latest kernel. See this forum discussion: https://www.raspberrypi.org/forums/viewtopic.php?p=1772216&sid=fa34ae6597591c1f80cb68c8138c6a67#p1772216

timonsku avatar Dec 15 '20 01:12 timonsku

As I mentioned, we have explored this path and there is still a little on going efforts but I don't believe it is something we can promise. @mbrooksx might be able to give you more info on this

Namburger avatar Dec 15 '20 01:12 Namburger

Oh I see. If it doesn't turn out to be a true hw limitation I would be very interested in seeing this getting supported. I currently have hardware in development that would see good use of the M.2 modules.

timonsku avatar Dec 15 '20 01:12 timonsku

@timonsku Unfortunately this ARM hardware does not support MSI-X. The raspberry pi discussion you referenced raised my hopes that limited performance with emulated interrupts might work. Although it still does not work, the on-going work is encouraging, and might lead to performance nearly as good as if the original MSI-X hardware interrupts were on the ARM silicon. Stay tuned!

usbguru avatar Dec 15 '20 17:12 usbguru

@timonsku : Yes, I'm actively working with the people in the Pi forum discussion. While MSI-X isn't technically supported by the BCM2711, as you saw from that patch if SW indicates it works then the PCIe hardware is actually able to map some MSI-X interrupts correctly.

We've validated farther than you have (including MSI-X), your errors are because you're building for the 32-bit kernel but the driver expects 64-bit read/write (thus why writeq/readq don't exist). My plan is to customize the driver for Pi (including 32-bit workarounds) and likely submit it to the Pi kernel vs trying to update our DKMS package. Will keep you informed of the status.

mbrooksx avatar Dec 15 '20 17:12 mbrooksx

Awesome that is great to hear :)

timonsku avatar Dec 15 '20 17:12 timonsku

Great to hear that somebody is working on this issue! Already received my RPI CM4 + IO Board + PCIe Coral acc. Any news? Maybe I can help?

Valdiolus avatar Dec 29 '20 07:12 Valdiolus

Has anyone had a go at this? I've done a bit of debugging and hacking myself and got the kernel module to load and libedgetpu to start an inference (although it never finishes, some event is missing, and there is an HIB error?).

There are some changes needed in both the kernel module and the user-space drivers, so far primarily replacing 64bit memory accesses with two 32bit ones. My progress is here for the module which I have updated to the latest version from the dkms package and here for libedgetpu, but these changes are of course nowhere near merge-quality.

This is what libedgetpu logs:

I :273] Starting in normal mode
I :83] Opening /dev/apex_0. read_only=0
I :97] mmap_offset=0x0000000000040000, mmap_size=4096
I :108] Got map addr at 0x0xb6fde000
I :97] mmap_offset=0x0000000000044000, mmap_size=4096
I :108] Got map addr at 0x0xb6fdd000
I :97] mmap_offset=0x0000000000048000, mmap_size=4096
I :108] Got map addr at 0x0xb6fdc000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000000, w0=0x00000000, w1=0x00000000
I :191] Write: offset = 0x00000000000487a8, value = 0x0000000000000000
I :229] Read: offset = 0x0000000000048578, value: = 0x0000000000000010, w0=0x00000010, w1=0x00000000
I :136] MmuMapper#Map() : 00000000b6627000 -> 0000000001000000 (1 pages) flags=00000000.
I :55] MapMemory() page-aligned : device_address = 0x0000000001000000
I :169] Queue base : 0xb6627000 -> 0x0000000001000000 [4096 bytes]
I :136] MmuMapper#Map() : 00000000b6628000 -> 0000000001001000 (1 pages) flags=00000000.
I :55] MapMemory() page-aligned : device_address = 0x0000000001001000
I :179] Queue status block : 0xb6628000 -> 0x0000000001001000 [16 bytes]
I :191] Write: offset = 0x0000000000048590, value = 0x0000000001000000
I :191] Write: offset = 0x0000000000048598, value = 0x0000000001001000
I :191] Write: offset = 0x00000000000485a0, value = 0x0000000000000100
I :191] Write: offset = 0x0000000000048568, value = 0x0000000000000005
I :229] Read: offset = 0x0000000000048570, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x00000000000486d0, value: = 0x0000000000000000, w0=0x00000000, w1=0x00000000
I :191] Write: offset = 0x0000000000044018, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044158, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044198, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000441d8, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000044218, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000048788, value = 0x000000000000007f
I :229] Read: offset = 0x0000000000048788, value: = 0x000000000000007f, w0=0x0000007f, w1=0x00000000
I :191] Write: offset = 0x00000000000400c0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040150, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040110, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040250, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040298, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000402e0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040328, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040190, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000401d0, value = 0x0000000000000001
I :191] Write: offset = 0x0000000000040210, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000486e8, value = 0x0000000000000000
I :45] Set event fd : event_id:0 -> event_fd:7,
I :45] Set event fd : event_id:4 -> event_fd:11,
I :62] event_fd=7. Monitor thread begin.
I :45] Set event fd : event_id:5 -> event_fd:12,
I :45] Set event fd : event_id:6 -> event_fd:13,
I :62] event_fd=12. Monitor thread begin.
I :62] event_fd=11. Monitor thread begin.
I :45] Set event fd : event_id:7 -> event_fd:14,
I :62] event_fd=13. Monitor thread begin.
I :45] Set event fd : event_id:8 -> event_fd:15,
I :62] event_fd=14. Monitor thread begin.
I :45] Set event fd : event_id:9 -> event_fd:16,
I :45] Set event fd : event_id:10 -> event_fd:17,
I :62] event_fd=15. Monitor thread begin.
I :45] Set event fd : event_id:11 -> event_fd:18,
I :62] event_fd=16. Monitor thread begin.
I :62] event_fd=17. Monitor thread begin.
I :45] Set event fd : event_id:12 -> event_fd:19,
I :62] event_fd=18. Monitor thread begin.
I :191] Write: offset = 0x00000000000486a0, value = 0x000000000000000f
I :191] Write: offset = 0x00000000000485c0, value = 0x0000000000000001
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000001
I :172] Opening device at /dev/apex_0
I :62] event_fd=19. Monitor thread begin.
I :75] event_fd=19. Monitor thread got num_events=1.
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :191] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :191] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :191] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :229] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :229] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
I :47] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :58] Adding output "prediction" with 965 bytes.
I :167] Request prepared, total batch size: 1, total TPU requests required: 1.
I :310] Request [0]: Submitting P0 request immediately.
I :373] Request [0]: Need to map parameters.
I :136] MmuMapper#Map() : 00000000ad93d000 -> 8000000000000000 (953 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000000000
I :252] Mapped params : Buffer(ptr=0xad93d000) -> 0x8000000000000000, 3900864 bytes.
I :252] Mapped params : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :387] Request [0]: Need to do parameter-caching.
I :80] [0] Request constructed.
I :46] InstructionBuffers created.
I :653] Created new instruction buffers.
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :368] MapDataBuffers() done.
I :187] Linking Parameter: 0x8000000000000000
I :136] MmuMapper#Map() : 0000000001266000 -> 8000000000400000 (3 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000400000
I :223] Mapped "instructions" : Buffer(ptr=0x1266000) -> 0x8000000000400000, 9680 bytes. Direction=1
I :384] MapInstructionBuffers() done.
I :481] [0] SetState old=0, new=1.
I :393] [0] NotifyRequestSubmitted()
I :481] [0] SetState old=1, new=2.
I :83] Request[0]: Submitted
I :401] [0] NotifyRequestActive()
I :481] [0] SetState old=2, new=3.
I :133] Request[0]: Scheduling DMA[0]
I :394] Adding an element to the host queue.
I :191] Write: offset = 0x00000000000485a8, value = 0x0000000000000001
I :80] [1] Request constructed.
I :113] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :188] Adding output "prediction" with 965 bytes.
I :46] InstructionBuffers created.
I :653] Created new instruction buffers.
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.
I :136] MmuMapper#Map() : 0000000001226000 -> 8000000000440000 (38 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000440000
I :223] Mapped "map/TensorArrayStack/TensorArrayGatherV3" : Buffer(ptr=0x1226440) -> 0x8000000000440440, 150528 bytes. Direction=1
I :136] MmuMapper#Map() : 0000000001276000 -> 8000000000404000 (1 pages) flags=00000004.
I :55] MapMemory() page-aligned : device_address = 0x8000000000404000
I :223] Mapped "prediction" : Buffer(ptr=0x1276000) -> 0x8000000000404000, 968 bytes. Direction=2
I :368] MapDataBuffers() done.
I :93] Linking map/TensorArrayStack/TensorArrayGatherV3[0]: 0x8000000000440440
I :93] Linking prediction[0]: 0x8000000000404000
I :136] MmuMapper#Map() : 00000000012b9000 -> 8000000000420000 (32 pages) flags=00000002.
I :55] MapMemory() page-aligned : device_address = 0x8000000000420000
I :223] Mapped "instructions" : Buffer(ptr=0x12b9000) -> 0x8000000000420000, 129536 bytes. Direction=1
I :384] MapInstructionBuffers() done.
I :481] [1] SetState old=0, new=1.
I :393] [1] NotifyRequestSubmitted()
I :481] [1] SetState old=1, new=2.
I :83] Request[1]: Submitted
I :401] [1] NotifyRequestActive()
I :481] [1] SetState old=2, new=3.
I :133] Request[1]: Scheduling DMA[0]
I :394] Adding an element to the host queue.
I :191] Write: offset = 0x00000000000485a8, value = 0x0000000000000002

Also the only interrupt firing seems to be the fatal error one:

cat /sys/class/apex/apex_0/interrupt_counts
0x00: 0
0x01: 0
0x02: 0
0x03: 0
0x04: 0
0x05: 0
0x06: 0
0x07: 0
0x08: 0
0x09: 0
0x0a: 0
0x0b: 0
0x0c: 2

markus-k avatar Jan 15 '21 12:01 markus-k

@markus-k woa, thanks for sharing that @mbrooksx for awareness

Namburger avatar Jan 15 '21 14:01 Namburger

@markus-k thank your for your sharing. I add othbootargs=gasket.dma_bit_mask=32 to avoid HIB error. But after running the sample program, I still get the following errors. Did you have any ideas ? (Rasbian OS is 32bit; all the code is download from markus-k's repo) Thank you -Jack

messageImage_1612070152087 messageImage_1612070000068

hiwudery avatar Jan 31 '21 05:01 hiwudery

@hiwudery That's weird. Your upper and lower 32bits are cloned when reading from the device (see the line with I :229), which my patch should fix. Maybe the compiler optimized the two reads into one ldrd? But since that still performs two 32bit accesses, I don't really understand why that happens.

I just tried setting dma_bit_mask but still get HIB Errors, in addition to out of memory errors when mapping buffers. Also from dmesg:

[  971.201472] apex 0000:01:00.0: gasket_perform_mapping i 0
[  971.201480] apex 0000:01:00.0: gasket_page_table_map done: ha b657c000 daddr 1000000 num 1, flags 0 ret 0
[  971.201552] apex 0000:01:00.0: gasket_perform_mapping i 0
[  971.201558] apex 0000:01:00.0: gasket_page_table_map done: ha b657d000 daddr 1001000 num 1, flags 0 ret 0
[  971.271839] apex 0000:01:00.0: gasket_alloc_extended_subtable -> fail to map page ffffffffffffffff [pfn 6d9fed66 phys 732d8923]
[  971.271854] apex 0000:01:00.0: no memory for extended addr subtable
[  971.271861] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[  971.271868] apex 0000:01:00.0: gasket_page_table_map done: ha ad63c000 daddr 8000000000000000 num 953, flags 2 ret -12
[  971.271907] apex 0000:01:00.0: gasket_alloc_extended_subtable -> fail to map page ffffffffffffffff [pfn 6d9fed66 phys 732d8923]
[  971.271915] apex 0000:01:00.0: no memory for extended addr subtable
[  971.271921] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[  971.271928] apex 0000:01:00.0: gasket_page_table_map done: ha ad63c000 daddr 8000000000000000 num 953, flags 0 ret -12

I'm also not sure if dma_bit_mask is right here. The comment says it's used for PCIe controller which can't do 64bit addressing, but the Raspberry Pis PCIe controller can do 64bit addressing, but only 32bit wide accesses (as noted by PhilE here).

markus-k avatar Jan 31 '21 10:01 markus-k

Yes, what you've done is essentially everything I've done for debug. The only additional change you alluded to is correct - the compiler is too smart for libedgetpu and expects a competent system that would be able have 64-bit wide accesses. I fixed this by using volatile variables to skip caching. My repos of progress are: https://github.com/mbrooksx/libedgetpu (Userspace) https://github.com/mbrooksx/pi-cm4-gasket-hacks (Kernel)

Note that I added an additional print - the host-side page address for the failed DMA transaction (it reports 0x100004000000000 - which is outside of the Pi RAM). The hope is that dma_bit_mask and command line swiotlb=65536 would create shadow registers in the 32-bit space but the Pi PCIe restrictions are very challenging. It is likely the coherent memory (setup in libedgetpu) is corrupted and thus the shared memory between the two is passing invalid information.

The other option that may be easier is the 32-bit kernel. It has issues with allocating enough BAR memory, but with some device tree tweaks this could likely be fixed. This paired with the 32-bit "aware" user-space may be an easier path. I've asked the Pi team to investigate this as well.

mbrooksx avatar Feb 03 '21 20:02 mbrooksx

@mbrooksx - And for the benefit of anyone who hasn't touched BAR space allocations, here's a guide I wrote on it a few months back testing graphics cards on the CM4: https://gist.github.com/geerlingguy/9d78ea34cab8e18d71ee5954417429df

The latest 5.10.y kernels for Pi OS already increased the default allocation to 1 GB I think (maybe even 4 or 8 GB? I don't remember if I followed up and checked on those commits).

geerlingguy avatar Feb 03 '21 23:02 geerlingguy

Yes, what you've done is essentially everything I've done for debug. The only additional change you alluded to is correct - the compiler is too smart for libedgetpu and expects a competent system that would be able have 64-bit wide accesses. I fixed this by using volatile variables to skip caching. My repos of progress are: https://github.com/mbrooksx/libedgetpu (Userspace) https://github.com/mbrooksx/pi-cm4-gasket-hacks (Kernel)

Note that I added an additional print - the host-side page address for the failed DMA transaction (it reports 0x100004000000000 - which is outside of the Pi RAM). The hope is that dma_bit_mask and command line swiotlb=65536 would create shadow registers in the 32-bit space but the Pi PCIe restrictions are very challenging. It is likely the coherent memory (setup in libedgetpu) is corrupted and thus the shared memory between the two is passing invalid information.

The other option that may be easier is the 32-bit kernel. It has issues with allocating enough BAR memory, but with some device tree tweaks this could likely be fixed. This paired with the 32-bit "aware" user-space may be an easier path. I've asked the Pi team to investigate this as well.

Alright, at least I haven't been looking in the completely wrong place. I've done most of my debugging on a 32-bit kernel so far. The default BAR space seems to be 1GB, I'm not sure if that's enough, but I'm not seeing any BAR allocation errors.

In case this helps anyone, some more debug logs. I've added your additional debug print, on a 32-bit kernel without any additional parameters:

[   77.630936] apex 0000:01:00.0: Fault VA: 0x0
[   77.630952] apex 0000:01:00.0: Fault VA: 0x0
[   77.635926] apex 0000:01:00.0: Fault VA: 0x0
[   77.635940] apex 0000:01:00.0: Fault VA: 0x0
[   77.635953] apex 0000:01:00.0: Fault VA: 0x0
[   77.635966] apex 0000:01:00.0: Fault VA: 0x0
[   77.635978] apex 0000:01:00.0: Fault VA: 0x0
[   77.635990] apex 0000:01:00.0: Fault VA: 0x0
[   77.636002] apex 0000:01:00.0: Fault VA: 0x0
[   77.636014] apex 0000:01:00.0: Fault VA: 0x0
[   83.141193] apex 0000:01:00.0: Fault VA: 0x1001000
[   83.141216] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   83.141237] apex 0000:01:00.0: Computed Failing Bus Addr: 0x40c800000
[   83.141259] apex 0000:01:00.0: Fault VA: 0x1001000
[   83.141277] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   83.141296] apex 0000:01:00.0: Computed Failing Bus Addr: 0x40c800000
[   83.141320] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff
[   83.141345] apex 0000:01:00.0: Fault VA: 0xffffffff
[   83.141362] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[   83.141381] apex 0000:01:00.0: Computed Failing Bus Addr: 0x0
[   83.141402] apex 0000:01:00.0: Fault VA: 0x0
[   83.150222] apex 0000:01:00.0: Fault VA: 0x0
[   83.150243] apex 0000:01:00.0: Fault VA: 0x0
[   83.150263] apex 0000:01:00.0: Fault VA: 0x0
[   83.150284] apex 0000:01:00.0: Fault VA: 0x0
[   83.150309] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff

I've also tried using gasket.dma_bit_mask=32 swiotlb=65536 on a 32-bit kernel:

[   41.372303] apex 0000:01:00.0: Fault VA: 0x0
[   41.372321] apex 0000:01:00.0: Fault VA: 0x0
[   41.378062] apex 0000:01:00.0: Fault VA: 0x0
[   41.378079] apex 0000:01:00.0: Fault VA: 0x0
[   41.378094] apex 0000:01:00.0: Fault VA: 0x0
[   41.378109] apex 0000:01:00.0: Fault VA: 0x0
[   41.378124] apex 0000:01:00.0: Fault VA: 0x0
[   41.378139] apex 0000:01:00.0: Fault VA: 0x0
[   41.378153] apex 0000:01:00.0: Fault VA: 0x0
[   41.378168] apex 0000:01:00.0: Fault VA: 0x0
[   41.628343] ------------[ cut here ]------------
[   41.628367] WARNING: CPU: 3 PID: 707 at kernel/dma/swiotlb.c:683 swiotlb_map+0x38c/0x43c
[   41.628374] apex 0000:01:00.0: swiotlb addr 0x0000000415400000+4096 overflow (mask ffffffff, bus limit 47fffffff).
[   41.628379] Modules linked in: sha256_generic cfg80211 rfkill 8021q garp stp llc binfmt_misc v3d raspberrypi_hwmon vc4 gpu_sched dwc2 cec roles drm_kms_helper drm bcm2835_isp(C) i2c_bcm2835 bcm2835_codec(C) bcm2835_v4l2(C) drm_panel_orientation_quirks v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc apex(C) snd_soc_core vc_sm_cma(C) gasket(C) snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops backlight rpivid_mem uio_pdrv_genirq uio i2c_dev ip_tables x_tables ipv6
[   41.628599] CPU: 3 PID: 707 Comm: python3 Tainted: G         C        5.10.6-v7l+ #6
[   41.628602] Hardware name: BCM2711
[   41.628605] Backtrace:
[   41.628617] [<c0b84b94>] (dump_backtrace) from [<c0b84f24>] (show_stack+0x20/0x24)
[   41.628621]  r7:ffffffff r6:00000000 r5:60000013 r4:c12e6c98
[   41.628626] [<c0b84f04>] (show_stack) from [<c0b892bc>] (dump_stack+0xcc/0xf8)
[   41.628632] [<c0b891f0>] (dump_stack) from [<c02216d4>] (__warn+0xfc/0x114)
[   41.628637]  r10:00001000 r9:00000009 r8:c02a5a50 r7:000002ab r6:00000009 r5:c02a5a50
[   41.628640]  r4:c0e3cd00 r3:c1205094
[   41.628645] [<c02215d8>] (__warn) from [<c0b856c8>] (warn_slowpath_fmt+0xa4/0xd8)
[   41.628648]  r7:000002ab r6:c0e3cd00 r5:c1205048 r4:c0e3ccbc
[   41.628654] [<c0b85628>] (warn_slowpath_fmt) from [<c02a5a50>] (swiotlb_map+0x38c/0x43c)
[   41.628658]  r9:c1b8b070 r8:c1205048 r7:00000000 r6:ffffffff r5:00000000 r4:ffffffff
[   41.628664] [<c02a56c4>] (swiotlb_map) from [<c02a0668>] (dma_map_page_attrs+0x254/0x394)
[   41.628668]  r10:00000001 r9:00001000 r8:c1b8b1e0 r7:00000000 r6:ffffffff r5:c1205048
[   41.628671]  r4:c1b8b070
[   41.628690] [<c02a0414>] (dma_map_page_attrs) from [<bf115184>] (gasket_map_extended_pages+0x100/0x45c [gasket])
[   41.628694]  r10:00000000 r9:c4112000 r8:c32ab700 r7:f09dc000 r6:00000200 r5:000003b9
[   41.628697]  r4:f085d018
[   41.628717] [<bf115084>] (gasket_map_extended_pages [gasket]) from [<bf115900>] (gasket_page_table_map+0xa8/0x100 [gasket])
[   41.628721]  r10:c32ab740 r9:ad63c000 r8:00000000 r7:80000000 r6:c2f97c00 r5:c32ab700
[   41.628724]  r4:000003b9
[   41.628741] [<bf115858>] (gasket_page_table_map [gasket]) from [<bf112a9c>] (gasket_map_buffers_common+0x90/0xa8 [gasket])
[   41.628745]  r10:00000005 r9:00000001 r8:c30e1180 r7:4028dc0c r6:c2f97c00 r5:c2f97c00
[   41.628748]  r4:c32a5d90
[   41.628767] [<bf112a0c>] (gasket_map_buffers_common [gasket]) from [<bf112cac>] (gasket_handle_ioctl+0x1f8/0x8e0 [gasket])
[   41.628770]  r5:beb40fa0 r4:c1205048
[   41.628788] [<bf112ab4>] (gasket_handle_ioctl [gasket]) from [<bf1106f8>] (gasket_ioctl+0x9c/0x118 [gasket])
[   41.628792]  r9:beb40fa0 r8:c2f97c00 r7:bf09a1b0 r6:4028dc0c r5:c30e1180 r4:c1205048
[   41.628805] [<bf11065c>] (gasket_ioctl [gasket]) from [<c0451180>] (sys_ioctl+0x1d4/0x8ec)
[   41.628809]  r9:c32a4000 r8:00000000 r7:c30e1180 r6:c30e1181 r5:c1205048 r4:4028dc0c
[   41.628815] [<c0450fac>] (sys_ioctl) from [<c0200040>] (ret_fast_syscall+0x0/0x28)
[   41.628818] Exception stack(0xc32a5fa8 to 0xc32a5ff0)
[   41.628822] 5fa0:                   beb40f9c 00000000 00000005 4028dc0c beb40fa0 00000005
[   41.628826] 5fc0: beb40f9c 00000000 b454da7c 00000036 00000001 01f0349c 00000000 b48a4bbc
[   41.628829] 5fe0: b454db58 beb40f74 b443ba3f b6cd551c
[   41.628833]  r10:00000036 r9:c32a4000 r8:c0200204 r7:00000036 r6:b454da7c r5:00000000
[   41.628836]  r4:beb40f9c
[   41.628840] ---[ end trace a2d67e6b70f87dd2 ]---
[   41.628855] apex 0000:01:00.0: no memory for extended addr subtable
[   41.628861] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[   41.628911] apex 0000:01:00.0: no memory for extended addr subtable
[   41.628917] apex 0000:01:00.0: page table slots (0,0) (@ 0x8000000000000000) to (8191,511) are not available
[   41.646322] apex 0000:01:00.0: Fault VA: 0x1001000
[   41.646330] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   41.646338] apex 0000:01:00.0: Computed Failing Bus Addr: 0xc800000
[   41.646347] apex 0000:01:00.0: Fault VA: 0x1001000
[   41.646352] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x8, Simple: 0x1001
[   41.646359] apex 0000:01:00.0: Computed Failing Bus Addr: 0xc800000
[   41.646372] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff
[   41.646384] apex 0000:01:00.0: Fault VA: 0xffffffff
[   41.646389] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[   41.646396] apex 0000:01:00.0: Computed Failing Bus Addr: 0xdeadbeef
[   41.646405] apex 0000:01:00.0: Fault VA: 0x0
[   41.648266] apex 0000:01:00.0: Fault VA: 0x0
[   41.648275] apex 0000:01:00.0: Fault VA: 0x0
[   41.648283] apex 0000:01:00.0: Fault VA: 0x0
[   41.648292] apex 0000:01:00.0: Fault VA: 0x0
[   41.648305] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff

In this case mapping the buffer fails in libedgetpu:

I :192] Write: offset = 0x00000000000486a0, value = 0x000000000000000f
I :62] event_fd=19. Monitor thread begin.
I :192] Write: offset = 0x00000000000485c0, value = 0x0000000000000001
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :192] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :231] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :172] Opening device at /dev/apex_0
I :231] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
I :75] event_fd=19. Monitor thread got num_events=1.
I :192] Write: offset = 0x00000000000486c0, value = 0x0000000000000000
I :192] Write: offset = 0x00000000000486c8, value = 0x0000000000000000
I :231] Read: offset = 0x00000000000486f0, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
I :231] Read: offset = 0x0000000000048700, value: = 0x0000000000000001, w0=0x00000001, w1=0x00000000
E :254] HIB Error. hib_error_status = 0000000000000001, hib_first_error_status = 0000000000000001
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
I :47] Adding input "map/TensorArrayStack/TensorArrayGatherV3" with 150528 bytes.
I :58] Adding output "prediction" with 965 bytes.
I :167] Request prepared, total batch size: 1, total TPU requests required: 1.
I :310] Request [0]: Submitting P0 request immediately.
I :373] Request [0]: Need to map parameters.
I :118] Failed to map buffer with flags, error -1
Traceback (most recent call last):
  File "classify_image.py", line 126, in <module>
    main()
  File "classify_image.py", line 115, in main
    interpreter.invoke()
  File "/home/pi/venv/lib/python3.7/site-packages/tflite_runtime/interpreter.py", line 540, in invoke
    self._interpreter.Invoke()
RuntimeError: Failed to execute request. Could not map pages : 5 (Cannot allocate memory)Node number 1 (EdgeTpuDelegateForCustomOp) failed to invoke.

I :226] Releasing Edge TPU device at /dev/apex_0
I :178] Closing Edge TPU device at /dev/apex_0

markus-k avatar Feb 04 '21 10:02 markus-k

@markus-k in gasket_page_table.c, the page table is 64bit format not 32bit format. I think the gasket_page_table also need to modify in 32bit kernel.

  • Address format:
  • Simple addresses - those whose containing pages are directly placed in the
  • device's address translation registers - are laid out as:
  • [ 63 - 25: 0 | 24 - 12: page index | 11 - 0: page offset ]

hiwudery avatar Feb 04 '21 18:02 hiwudery

I also wanted to note something here that may be of interest—I noticed earlier someone mentioned writeq being present on 64-bit OSes. I'll soon be testing the Coral TPU (M.2 A+E key version) on a Pi so haven't yet had first-hand experience, but with a different driver I was taking a look at, it seems that one problem may be that writeq is not supported on Pi OS / the Pi's PCI-E bus like it may be on some other 64-bit systems.

Edit: New bug reported relating to that driver issue is here: https://github.com/raspberrypi/linux/issues/4158

geerlingguy avatar Feb 17 '21 05:02 geerlingguy

On 64-bit Pi OS (with latest kernel compiled at 5.10.14-v8+), I get the following kernel panic after running through the default steps in the setup guide:

IMG_3633

(Cross-linking to https://github.com/geerlingguy/raspberry-pi-pcie-devices/issues/44#issuecomment-780912830)

geerlingguy avatar Feb 17 '21 23:02 geerlingguy

You should probably read the rest of this issue, there hasn't been any development since my last comment to my knowledge. The default gasket module won't work at all, my fixed one at least loads and can read temperature, but something is still wrong with the DMA, so it won't work either. Then there's probably still a few other things broken in the user space driver as well.

I don't have the time to dig into this right now, and my knowledge with kernel dev is limited anyway. So best we can do is hope someone with deep understanding of how the DMA and TPU works can find some time and look into it.

markus-k avatar Feb 17 '21 23:02 markus-k

@mbrooksx sounded like Google was working on it? Maybe he could update us. I still have very big interest in this for my product but don't have the resources or know-how to dig into this.

timonsku avatar Feb 17 '21 23:02 timonsku

If someone at Google is working on it, or is going to, it would be nice to get a very rough ETA (weeks, months) on when we can expect to know whether or not the TPU will ever work over PCIe on a CM4. I'll be creating a new revision of my products PCB in few weeks, and if there's very little chance the PCIe TPU won't work anytime soon, I'll have to switch both to USB.

markus-k avatar Feb 19 '21 13:02 markus-k

Yea similar situation for me.

timonsku avatar Feb 19 '21 13:02 timonsku

I unfortunately don't have an estimated date. The CM4 PCIe hardware is antiquated, and there are endless hacks required to try to have it operate competently (note that the TPU is a PCIe bus master, and I don't see any evidence of a bus master ever being tested with the CM4). We haven't been receiving the support needed from the Pi team, so for now it's continuing to try things to understand the issues with communication (at this point it seems an issue with the shared memory). It may be within the next few weeks for operation (in which case I would post the hacked up version for your evaluation while we decide the best way to release this without polluting the main Coral codebase). I will keep this thread up to date.

Depending on the board configuration, USB may be a better choice.

mbrooksx avatar Feb 22 '21 18:02 mbrooksx

My latest theory isn't encouraging (note that this would be really easy to solve in a non-COVID world, where would just plug this into a PCIe bus analyzer and see what data the CM4 is malforming):

When you run a model through the compiler it assigns virtual memory locations for the various operations, scratch memory, weights, etc. There are two mappings these addresses use to map to physical pages, what the driver calls simple and extended. The issue is that the way to differentiate simple and extended is the 63rd bit of the virtual address. So when the shared coherent memory between the CPU and TPU has been established - the TPU reads in this region to get the address of information it needs (in this case it's the location of the instruction queue). But because of the CM4's crippled PCIe bus, it is reading only 32bits of the virtual address - which means it interprets every read as a simple read.

The problem then is it will attempt to mmap this to the system and it will get wrong data (since the correct mapping was via the extended approach). The problem is the TPU is doing these reads (including checking the 64-bit) in hardware, we have no way to change which bit indicates extended mapping. If this is indeed the primary source of failure, it would require a hacked up version of the compiler that assigns everything into simple mapping - this would cripple the maximum size of the model, parameters, etc that is allowed.

I'll explore that option if we can verify this is indeed the cause.

mbrooksx avatar Mar 02 '21 17:03 mbrooksx

(Thank you everyone for working on this issue!) I have a new setup (Custom CM4 carrier with M.2 PCIe-EdgeTPU) and would love to help get this integration working. Are the following repos still the latest progress in userspace/kernel?

Yes, what you've done is essentially everything I've done for debug. The only additional change you alluded to is correct - the compiler is too smart for libedgetpu and expects a competent system that would be able have 64-bit wide accesses. I fixed this by using volatile variables to skip caching. My repos of progress are: https://github.com/mbrooksx/libedgetpu (Userspace) https://github.com/mbrooksx/pi-cm4-gasket-hacks (Kernel)

kampff avatar Apr 03 '21 09:04 kampff

It would be so sad if it would never be possible to use the Coral Boards via PCIE on the CM4. The combo is the perfect high performance - low power - compact formfactor - multi camera - mainline kernel supported - embedded inference platform. Please please find a way to make it useable.

julled avatar Apr 05 '21 17:04 julled

I completely agree about the potential with the combination. At this point, it looks like a irreparable hardware issue with the antiquated CM4 PCIe module. I have forced all the allocations into simple mapping (see above for more info about this) so that all the virtual addresses are 32-bit, as well as previously setting all reads/writes to 32-bit. However, the device itself (in hardware) makes reads/writes in the coherent cache - all of these read/writes are 64-bits.

For now, the plan is to wait until the office is open so we can use a PCIe analyzer and confirm this hypothesis. But there doesn't appear to be any additional changes that we can do in SW - the device expecting a host to be able to perform 64-bit read/write is built into the hardware.

USB is still the recommendation for the CM4. USB2.0 is possible out of box, and USB3.0 may be possible although extra design considerations are required (more info here: https://coral.ai/products/accelerator-module/).

mbrooksx avatar Apr 12 '21 20:04 mbrooksx

Choosing to believe this is still possible...here are my current DMESG and libedgetpu logs: (Kernel: 5.10.23-v8+ (aarch64) with gasket/apex modules and libedgetpu from mbooksx's repos, custom Buildroot Rootfs)

DMESG

[ 1876.006541] apex 0000:01:00.0: Fault VA: 0xffffffff
[ 1876.012884] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[ 1876.024280] apex 0000:01:00.0: Computed Failing Bus Addr: 0x0
[ 1876.031596] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.042358] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.048153] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.053923] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.059681] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.065456] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.071141] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.076769] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.082370] apex 0000:01:00.0: Fault VA: 0x0
[ 1876.089568] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f89c74000, dev_addr 0x1000000, num_pages 1
[ 1876.100752] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f89c75000, dev_addr 0x1001000, num_pages 1
[ 1876.160486] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f5f969000, dev_addr 0x0, num_pages 1603
[ 1876.171885] apex 0000:01:00.0: Map Simple Pages: host_addr 0xd9c3000, dev_addr 0x1004000, num_pages 3
[ 1876.185214] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f88350000, dev_addr 0x1080000, num_pages 66
[ 1876.196648] apex 0000:01:00.0: Map Simple Pages: host_addr 0xd9c7000, dev_addr 0x1002000, num_pages 2
[ 1876.208103] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f88272000, dev_addr 0x1040000, num_pages 44
[ 1876.219712] apex 0000:01:00.0: Map Simple Pages: host_addr 0xd9ca000, dev_addr 0x1008000, num_pages 2
[ 1876.230804] apex 0000:01:00.0: Map Simple Pages: host_addr 0x7f88231000, dev_addr 0x1100000, num_pages 63

(here the test program hangs until ctrl-c)

[ 1904.820076] apex 0000:01:00.0: Fault VA: 0xbe96c8
[ 1904.826533] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x5, Simple: 0xbe9
[ 1904.837859] apex 0000:01:00.0: Computed Failing Bus Addr: 0x100004000000000
[ 1904.846581] apex 0000:01:00.0: Fault VA: 0xbe96c8
[ 1904.853128] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x5, Simple: 0xbe9
[ 1904.864475] apex 0000:01:00.0: Computed Failing Bus Addr: 0x100004000000000
[ 1904.873204] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff
[ 1904.880539] apex 0000:01:00.0: Fault VA: 0xffffffff
[ 1904.887108] apex 0000:01:00.0: Failing in first (simple) read access. Extended_level0: 0x7ff, Simple: 0x1fff
[ 1904.898652] apex 0000:01:00.0: Computed Failing Bus Addr: 0x0
[ 1904.906057] apex 0000:01:00.0: Fault VA: 0x0
[ 1904.921784] apex 0000:01:00.0: Fault VA: 0x0
[ 1904.927701] apex 0000:01:00.0: Fault VA: 0x0
[ 1904.933515] apex 0000:01:00.0: Fault VA: 0x0
[ 1904.939298] apex 0000:01:00.0: Fault VA: 0x0
[ 1904.945065] apex 0000:01:00.0: Fault VA: 0xffffffffffffffff

libedgetpu (verbosity=10)

I :944] EnumerateDevices: vendor:0x1a6e, product:0x89a                                                                                                                                                            
I :944] EnumerateDevices: vendor:0x18d1, product:0x9302                                                                                                                                                           
Test_EdgeTPU[412]: (main:70): Num EdgeTPU Devices: 1                                                                                                                                                              
I :453] No matching device is already opened for shared ownership.                                                                                                                                                
I :944] EnumerateDevices: vendor:0x1a6e, product:0x89a                                                                                                                                                            
I :944] EnumerateDevices: vendor:0x18d1, product:0x9302                                                                                                                                                           
I :104] USB always DFU: False (default)                                                                                                                                                                           
I :126] USB bulk-in queue capacity: default                                                                                                                                                                       
I :65] Performance expectation: Max (default)                                                                                                                                                                     
I :273] Hello Adam!                                                                                                                                                                                               
I :274] Starting in FUCK YEAH mode                                                                                                                                                                                
I :83] Opening /dev/apex_0. read_only=0                                                                                                                                                                           
I :97] mmap_offset=0x0000000000040000, mmap_size=4096                                                                                                                                                             
I :108] Got map addr at 0x0x7f904db000                                                                                                                                                                            
I :97] mmap_offset=0x0000000000044000, mmap_size=4096                                                                                                                                                             
I :108] Got map addr at 0x0x7f89c79000                                                                                                                                                                            
I :97] mmap_offset=0x0000000000048000, mmap_size=4096                                                                                                                                                             
I :108] Got map addr at 0x0x7f89c78000                                                                                                                                                                            
I :240] Offset: 0x00000000000486f0, mmap_reg: 0x7f89c786f0, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000000, value:0x0000000000000000                                     
I :269] Read 32 Hacks: offset = 0x00000000000486f0, lower: = 0x0000000000000000 upper: = 0x0000000000000000 value: = 0x0000000000000000 mmap: 0x7f89c786f0                                                        
I :282] Page Fault Address: 0x0000000000000000                                                                                                                                                                    
I :195] Write 32 Hacks: offset = 0x00000000000487a8, value = 0x0000000000000000 mmap=0x7f89c787a8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000487a8, value: = 0x0000000000000000                                                                                                                                 
I :240] Offset: 0x0000000000048578, mmap_reg: 0x7f89c78578, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000010, value:0x0000000000000010                                     
I :269] Read 32 Hacks: offset = 0x0000000000048578, lower: = 0x0000000000000010 upper: = 0x0000000000000000 value: = 0x0000000000000010 mmap: 0x7f89c78578                                                        
I :282] Page Fault Address: 0x0000000000000000                                                                                                                                                                    
I :136] MmuMapper#Map() : 0000007f89c74000 -> 0000000001000000 (1 pages) flags=00000000.                                                                                                                          
I :55] MapMemory() page-aligned : device_address = 0x0000000001000000                                                                                                                                             
I :169] Queue base : 0x7f89c74000 -> 0x0000000001000000 [4096 bytes]                                                                                                                                              
I :136] MmuMapper#Map() : 0000007f89c75000 -> 0000000001001000 (1 pages) flags=00000000.                                                                                                                          
I :55] MapMemory() page-aligned : device_address = 0x0000000001001000                                                                                                                                             
I :179] Queue status block : 0x7f89c75000 -> 0x0000000001001000 [16 bytes]                                                                                                                                        
I :195] Write 32 Hacks: offset = 0x0000000000048590, value = 0x0000000001000000 mmap=0x7f89c78590                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000048590, value: = 0x0000000001000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000048598, value = 0x0000000001001000 mmap=0x7f89c78598                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000048598, value: = 0x0000000001001000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000485a0, value = 0x0000000000000100 mmap=0x7f89c785a0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000485a0, value: = 0x0000000000000100                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000048568, value = 0x0000000000000005 mmap=0x7f89c78568                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000048568, value: = 0x0000000000000005                                                                                                                                 
I :240] Offset: 0x0000000000048570, mmap_reg: 0x7f89c78570, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000001, value:0x0000000000000001                                     
I :269] Read 32 Hacks: offset = 0x0000000000048570, lower: = 0x0000000000000001 upper: = 0x0000000000000000 value: = 0x0000000000000001 mmap: 0x7f89c78570                                                        
I :282] Page Fault Address: 0x0000000000000000                                                                                                                                                                    
I :240] Offset: 0x00000000000486d0, mmap_reg: 0x7f89c786d0, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000000, value:0x0000000000000000                                     
I :269] Read 32 Hacks: offset = 0x00000000000486d0, lower: = 0x0000000000000000 upper: = 0x0000000000000000 value: = 0x0000000000000000 mmap: 0x7f89c786d0                                                        
I :282] Page Fault Address: 0x0000000000000000                                                                                                                                                                    
I :195] Write 32 Hacks: offset = 0x0000000000044018, value = 0x0000000000000001 mmap=0x7f89c79018                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000044018, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000044158, value = 0x0000000000000001 mmap=0x7f89c79158                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000044158, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000044198, value = 0x0000000000000001 mmap=0x7f89c79198                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000044198, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000441d8, value = 0x0000000000000001 mmap=0x7f89c791d8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000441d8, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000044218, value = 0x0000000000000001 mmap=0x7f89c79218                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000044218, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000048788, value = 0x000000000000007f mmap=0x7f89c78788                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000048788, value: = 0x000000000000007f                                                                                                                                 
I :240] Offset: 0x0000000000048788, mmap_reg: 0x7f89c78788, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x000000000000007f, value:0x000000000000007f                                     
I :269] Read 32 Hacks: offset = 0x0000000000048788, lower: = 0x000000000000007f upper: = 0x0000000000000000 value: = 0x000000000000007f mmap: 0x7f89c78788                                                        
I :282] Page Fault Address: 0x0000000000000000                                                                                                                                                                    
I :195] Write 32 Hacks: offset = 0x00000000000400c0, value = 0x0000000000000001 mmap=0x7f904db0c0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000400c0, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040150, value = 0x0000000000000001 mmap=0x7f904db150                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040150, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040110, value = 0x0000000000000001 mmap=0x7f904db110                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040110, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040250, value = 0x0000000000000001 mmap=0x7f904db250                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040250, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040298, value = 0x0000000000000001 mmap=0x7f904db298                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040298, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000402e0, value = 0x0000000000000001 mmap=0x7f904db2e0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000402e0, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040328, value = 0x0000000000000001 mmap=0x7f904db328                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040328, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040190, value = 0x0000000000000001 mmap=0x7f904db190                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040190, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000401d0, value = 0x0000000000000001 mmap=0x7f904db1d0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000401d0, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x0000000000040210, value = 0x0000000000000001 mmap=0x7f904db210                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x0000000000040210, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000486e8, value = 0x0000000000000000 mmap=0x7f89c786e8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000486e8, value: = 0x0000000000000000                                                                                                                                 
I :45] Set event fd : event_id:0 -> event_fd:8,                                                                                                                                                                   
I :45] Set event fd : event_id:4 -> event_fd:12,                                                                                                                                                                  
I :62] event_fd=8. Monitor thread begin.                                                                                                                                                                          
I :45] Set event fd : event_id:5 -> event_fd:13,                                                                                                                                                                  
I :62] event_fd=12. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:6 -> event_fd:14,                                                                                                                                                                  
I :62] event_fd=13. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:7 -> event_fd:15,                                                                                                                                                                  
I :62] event_fd=14. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:8 -> event_fd:16,                                                                                                                                                                  
I :62] event_fd=15. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:9 -> event_fd:17,                                                                                                                                                                  
I :62] event_fd=16. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:10 -> event_fd:18,                                                                                                                                                                 
I :62] event_fd=17. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:11 -> event_fd:19,                                                                                                                                                                 
I :62] event_fd=18. Monitor thread begin.                                                                                                                                                                         
I :45] Set event fd : event_id:12 -> event_fd:20,                                                                                                                                                                 
I :62] event_fd=19. Monitor thread begin.                                                                                                                                                                         
I :195] Write 32 Hacks: offset = 0x00000000000486a0, value = 0x000000000000000f mmap=0x7f89c786a0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000486a0, value: = 0x000000000000000f                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000485c0, value = 0x0000000000000001 mmap=0x7f89c785c0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000485c0, value: = 0x0000000000000001                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000486c0, value = 0x0000000000000001 mmap=0x7f89c786c0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000486c0, value: = 0x0000000000000001                                                                                                                                 
I :62] event_fd=20. Monitor thread begin.                                                                                                                                                                         
I :172] Opening device at /dev/apex_0                                                                                                                                                                             
Test_EdgeTPU[412]: (main:75): EdgeTPU - path:type (0=PCIe, 1=USB): /dev/apex_0:0                                                                                                                                  
Test_EdgeTPU[412]: (main:80): Loading Model: /home/kampff/Voight-Kampff/objects_edgetpu.tflite                                                                                                                    
Test_EdgeTPU[412]: (main:82): Model Created
Test_EdgeTPU[412]: (main:89): Options configured: maybe                                                                                                                                                           
Test_EdgeTPU[412]: (main:94): Interpreter Created                                                                                                                                                                 
Test_EdgeTPU[412]: (main:98): Tensors Allocated                                                                                                                                                                   
Test_EdgeTPU[412]: (main:120): NPU inputs: 1 vs 1                                                                                                                                                                 
Test_EdgeTPU[412]: (main:127):  - Input 0 (normalized_input_image_tensor): Dimensionsw: 4                                                                                                                         
Test_EdgeTPU[412]: (main:132):    - Dimension 0: (size: 1)                                                                                                                                                        
Test_EdgeTPU[412]: (main:132):    - Dimension 1: (size: 300)                                                                                                                                                      
Test_EdgeTPU[412]: (main:132):    - Dimension 2: (size: 300)                                                                                                                                                      
Test_EdgeTPU[412]: (main:132):    - Dimension 3: (size: 3)                                                                                                                                                        
Test_EdgeTPU[412]: (main:138): NPU outputs: 4 vs 4                                                                                                                                                                
Test_EdgeTPU[412]: (main:145):  - Ouput 0 (TFLite_Detection_PostProcess): Dimensions: 3                                                                                                                           
Test_EdgeTPU[412]: (main:150):    - Dimension 0: 1)                                                                                                                                                               
Test_EdgeTPU[412]: (main:150):    - Dimension 1: 20)                                                                                                                                                              
Test_EdgeTPU[412]: (main:150):    - Dimension 2: 4)                                                                                                                                                               
Test_EdgeTPU[412]: (main:145):  - Ouput 1 (TFLite_Detection_PostProcess:1): Dimensions: 2                                                                                                                         
Test_EdgeTPU[412]: (main:150):    - Dimension 0: 1)                                                                                                                                                               
Test_EdgeTPU[412]: (main:150):    - Dimension 1: 20)                                                                                                                                                              
Test_EdgeTPU[412]: (main:145):  - Ouput 2 (TFLite_Detection_PostProcess:2): Dimensions: 2                                                                                                                         
Test_EdgeTPU[412]: (main:150):    - Dimension 0: 1)                                                                                                                                                               
Test_EdgeTPU[412]: (main:150):    - Dimension 1: 20)                                                                                                                                                              
Test_EdgeTPU[412]: (main:145):  - Ouput 3 (TFLite_Detection_PostProcess:3): Dimensions: 1                                                                                                                         
Test_EdgeTPU[412]: (main:150):    - Dimension 0: 1)                                                                                                                                                               
Test_EdgeTPU[412]: (main:167): Test Image Loaded                                                                                                                                                                  
Test_EdgeTPU[412]: (main:185): Labels Loaded                                                                                                                                                                      
Test_EdgeTPU[412]: (main:209): Inputs Configured                                                                                                                                                                  
I :47] Adding input "normalized_input_image_tensor" with 270000 bytes.                                                                                                                                            
I :58] Adding output "Squeeze" with 7668 bytes.                                                                                                                                                                   
I :58] Adding output "convert_scores" with 174447 bytes.                                                                                                                                                          
I :167] Request prepared, total batch size: 1, total TPU requests required: 1.                                                                                                                                    
I :310] Request [0]: Submitting P0 request immediately.                                                                                                                                                           
I :373] Request [0]: Need to map parameters.                                                                                                                                                                      
I :136] MmuMapper#Map() : 0000007f5f969000 -> 0000000000000000 (1603 pages) flags=00000002.                                                                                                                       
I :55] MapMemory() page-aligned : device_address = 0x0000000000000000                                                                                                                                             
I :252] Mapped params : Buffer(ptr=0x7f5f969000) -> 0x0000000000000000, 6564224 bytes.                                                                                                                            
I :252] Mapped params : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.                                                                                                                                         
I :387] Request [0]: Need to do parameter-caching.                                                                                                                                                                
I :80] [0] Request constructed.                                                                                                                                                                                   
I :46] InstructionBuffers created.                                                                                                                                                                                
I :653] Created new instruction buffers.                                                                                                                                                                          
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.                                                                                                                                         
I :368] MapDataBuffers() done.                                                                                                                                                                                    
I :187] Linking Parameter: 0x0000000000000000                                                                                                                                                                     
I :136] MmuMapper#Map() : 000000000d9c3000 -> 0000000001004000 (3 pages) flags=00000002.                                                                                                                          
I :55] MapMemory() page-aligned : device_address = 0x0000000001004000                                                                                                                                             
I :223] Mapped "instructions" : Buffer(ptr=0xd9c3000) -> 0x0000000001004000, 11472 bytes. Direction=1                                                                                                             
I :384] MapInstructionBuffers() done.                                                                                                                                                                             
I :481] [0] SetState old=0, new=1.                                                                                                                                                                                
I :393] [0] NotifyRequestSubmitted()                                                                                                                                                                              
I :481] [0] SetState old=1, new=2.                                                                                                                                                                                
I :83] Request[0]: Submitted                                                                                                                                                                                      
I :401] [0] NotifyRequestActive()                                                                                                                                                                                 
I :481] [0] SetState old=2, new=3.                                                                                                                                                                                
I :133] Request[0]: Scheduling DMA[0]                                                                                                                                                                             
I :393] Adding an element to the host queue.                                                                                                                                                                      
I :195] Write 32 Hacks: offset = 0x00000000000485a8, value = 0x0000000000000001 mmap=0x7f89c785a8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000485a8, value: = 0x0000000000000001                                                                                                                                 
I :75] event_fd=20. Monitor thread got num_events=1.                                                                                                                                                              
I :80] [1] Request constructed.                                                                                                                                                                                   
I :195] Write 32 Hacks: offset = 0x00000000000486c0, value = 0x0000000000000000 mmap=0x7f89c786c0                                                                                                                 
I :113] Adding input "normalized_input_image_tensor" with 270000 bytes.                                                                                                                                           
I :206] ReRead 32 Hacks: offset = 0x00000000000486c0, value: = 0x0000000000000000                                                                                                                                 
I :188] Adding output "Squeeze" with 7668 bytes.                                                                                                                                                                  
I :195] Write 32 Hacks: offset = 0x00000000000486c8, value = 0x0000000000000000 mmap=0x7f89c786c8                                                                                                                 
I :188] Adding output "convert_scores" with 174447 bytes.                                                                                                                                                         
I :206] ReRead 32 Hacks: offset = 0x00000000000486c8, value: = 0x0000000000000001                                                                                                                                 
I :240] Offset: 0x00000000000486f0, mmap_reg: 0x7f89c786f0, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000211, value:0x0000000000000211                                     
I :269] Read 32 Hacks: offset = 0x00000000000486f0, lower: = 0x0000000000000211 upper: = 0x0000000000000000 value: = 0x0000000000000211 mmap: 0x7f89c786f0                                                        
I :282] Page Fault Address: 0x0000000000be96c8                                                                                                                                                                    
I :240] Offset: 0x0000000000048700, mmap_reg: 0x7f89c78700, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000010, value:0x0000000000000010                                     
I :269] Read 32 Hacks: offset = 0x0000000000048700, lower: = 0x0000000000000010 upper: = 0x0000000000000000 value: = 0x0000000000000010 mmap: 0x7f89c78700                                                        
I :282] Page Fault Address: 0x0000000000be96c8                                                                                                                                                                    
I :240] Offset: 0x0000000000048700, mmap_reg: 0x7f89c78700, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000010, value:0x0000000000000010                                     
I :269] Read 32 Hacks: offset = 0x0000000000048700, lower: = 0x0000000000000010 upper: = 0x0000000000000000 value: = 0x0000000000000010 mmap: 0x7f89c78700                                                        
I :282] Page Fault Address: 0x0000000000be96c8                                                                                                                                                                    
E :254] HIB Error. hib_error_status = 0000000000000211, hib_first_error_status = 0000000000000010                                                                                                                 
I :75] event_fd=20. Monitor thread got num_events=1.                                                                                                                                                              
I :195] Write 32 Hacks: offset = 0x00000000000486c0, value = 0x0000000000000000 mmap=0x7f89c786c0                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000486c0, value: = 0x0000000000000000                                                                                                                                 
I :195] Write 32 Hacks: offset = 0x00000000000486c8, value = 0x0000000000000000 mmap=0x7f89c786c8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000486c8, value: = 0x0000000000000000                                                                                                                                 
I :240] Offset: 0x00000000000486f0, mmap_reg: 0x7f89c786f0, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000211, value:0x0000000000000211                                     
I :269] Read 32 Hacks: offset = 0x00000000000486f0, lower: = 0x0000000000000211 upper: = 0x0000000000000000 value: = 0x0000000000000211 mmap: 0x7f89c786f0                                                        
I :282] Page Fault Address: 0x0000000000be96c8                                                                                                                                                                    
I :240] Offset: 0x0000000000048700, mmap_reg: 0x7f89c78700, Upper: 0x0000000000000000, Shifted upper: 0x0000000000000000, lower: 0x0000000000000010, value:0x0000000000000010                                     
I :269] Read 32 Hacks: offset = 0x0000000000048700, lower: = 0x0000000000000010 upper: = 0x0000000000000000 value: = 0x0000000000000010 mmap: 0x7f89c78700                                                        
I :282] Page Fault Address: 0x0000000000be96c8                                                                                                                                                                    
E :254] HIB Error. hib_error_status = 0000000000000211, hib_first_error_status = 0000000000000010                                                                                                                 
I :46] InstructionBuffers created.                                                                                                                                                                                
I :653] Created new instruction buffers.                                                                                                                                                                          
I :75] Mapped scratch : Buffer(ptr=(nil)) -> 0x0000000000000000, 0 bytes.                                                                                                                                         
I :136] MmuMapper#Map() : 0000007f88350000 -> 0000000001080000 (66 pages) flags=00000002.                                                                                                                         
I :55] MapMemory() page-aligned : device_address = 0x0000000001080000                                                                                                                                             
I :223] Mapped "normalized_input_image_tensor" : Buffer(ptr=0x7f88350040) -> 0x0000000001080040, 270000 bytes. Direction=1                                                                                        
I :136] MmuMapper#Map() : 000000000d9c7000 -> 0000000001002000 (2 pages) flags=00000004.                                                                                                                          
I :55] MapMemory() page-aligned : device_address = 0x0000000001002000                                                                                                                                             
I :136] MmuMapper#Map() : 0000007f88272000 -> 0000000001040000 (44 pages) flags=00000004.                                                                                                                         
I :55] MapMemory() page-aligned : device_address = 0x0000000001040000                                                                                                                                             
I :223] Mapped "convert_scores" : Buffer(ptr=0x7f88272000) -> 0x0000000001040000, 176368 bytes. Direction=2                                                                                                       
I :223] Mapped "Squeeze" : Buffer(ptr=0xd9c7000) -> 0x0000000001002000, 7672 bytes. Direction=2                                                                                                                   
I :368] MapDataBuffers() done.                                                                                                                                                                                    
I :93] Linking normalized_input_image_tensor[0]: 0x0000000001080040                                                                                                                                               
I :93] Linking Squeeze[0]: 0x0000000001002000                                                                                                                                                                     
I :93] Linking convert_scores[0]: 0x0000000001040000                                                                                                                                                              
I :136] MmuMapper#Map() : 000000000d9ca000 -> 0000000001008000 (2 pages) flags=00000002.                                                                                                                          
I :55] MapMemory() page-aligned : device_address = 0x0000000001008000                                                                                                                                             
I :136] MmuMapper#Map() : 0000007f88231000 -> 0000000001100000 (63 pages) flags=00000002.                                                                                                                         
I :55] MapMemory() page-aligned : device_address = 0x0000000001100000                                                                                                                                             
I :223] Mapped "instructions" : Buffer(ptr=0x7f88231000) -> 0x0000000001100000, 256992 bytes. Direction=1                                                                                                         
I :223] Mapped "instructions" : Buffer(ptr=0xd9ca000) -> 0x0000000001008000, 7632 bytes. Direction=1                                                                                                              
I :384] MapInstructionBuffers() done.                                                                                                                                                                             
I :481] [1] SetState old=0, new=1.                                                                                                                                                                                
I :393] [1] NotifyRequestSubmitted()                                                                                                                                                                              
I :481] [1] SetState old=1, new=2.                                                                                                                                                                                
I :83] Request[1]: Submitted                                                                                                                                                                                      
I :401] [1] NotifyRequestActive()                                                                                                                                                                                 
I :481] [1] SetState old=2, new=3.                                                                                                                                                                                
I :133] Request[1]: Scheduling DMA[0]                                                                                                                                                                             
I :393] Adding an element to the host queue.                                                                                                                                                                      
I :195] Write 32 Hacks: offset = 0x00000000000485a8, value = 0x0000000000000002 mmap=0x7f89c785a8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000485a8, value: = 0x0000000000000002                                                                                                                                 
I :133] Request[1]: Scheduling DMA[1]                                                                                                                                                                             
I :393] Adding an element to the host queue.                                                                                                                                                                      
I :195] Write 32 Hacks: offset = 0x00000000000485a8, value = 0x0000000000000003 mmap=0x7f89c785a8                                                                                                                 
I :206] ReRead 32 Hacks: offset = 0x00000000000485a8, value: = 0x0000000000000003

program hangs until killed with ctl-c...

kampff avatar Apr 14 '21 10:04 kampff

These logs look like what I see as well. The HIB error there (hib_error_status = 0000000000000211) still indicates read failures.

I recently became aware of a new-ish DT Overlay from the Pi team for 32 bit DMA (I found it in this thread for bringing up a USB controller) - pcie-32bit-dma.dtbo. Alas adding it has no effect (and I verified it does cleanly apply).

mbrooksx avatar Apr 22 '21 00:04 mbrooksx

I think this new overlay originated from this issue over here: https://github.com/raspberrypi/linux/issues/4197#issuecomment-794014591

Maybe you can find some ideas on the problem in there ?

julled avatar Apr 22 '21 01:04 julled