Dual-Edge-TPU-Adapter
Dual-Edge-TPU-Adapter copied to clipboard
Coral M.2 Accelerator dual edge tpu with Dual Edge TPU Adapter - PCIe x1 Low Profile only one tpu working
Hi,
Hoping someone can help diagnose this
I've bought a m.2 dual edge accelerator and an adapter from makerfab
I've just got all the cameras working and running frigate and I'm getting constant reboots of frigate saying it can't find one of the tpus
I'm running frigate in a docker container
dmesg looks like its reporting an error from the adapter/accelerator possibly?
ls -l /dev/apex*
crw-rw---- 1 root apex 120, 0 Mar 25 18:23 /dev/apex_0
crw-rw---- 1 root apex 120, 1 Mar 25 18:23 /dev/apex_1
ls /sys/class/apex/
apex_0 apex_1
dmesg | grep apex
[ 35.356036] apex 0000:05:00.0: enabling device (0000 -> 0002)
[ 35.371387] apex 0000:06:00.0: enabling device (0000 -> 0002)
[ 40.512237] apex 0000:05:00.0: Apex performance not throttled due to temperature
[ 48.172225] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[ 53.312236] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 58.432237] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 63.552236] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 68.672243] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 83.300290] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[ 83.300301] apex 0000:06:00.0: Error in device open cb: -110
[ 83.300315] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 88.384283] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 93.504246] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 98.628059] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 115.671423] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[ 115.671430] apex 0000:06:00.0: Error in device open cb: -110
[ 115.671442] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 120.895350] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 126.015273] apex 0000:06:00.0: Apex performance not throttled due to temperature
[ 131.135234] apex 0000:06:00.0: Apex performance not throttled due to temperature
frigate docker compose file
version: "3.9"
services:
frigate:
container_name: frigate
privileged: true # this may not be necessary for all setups
restart: unless-stopped
image: ghcr.io/blakeblackshear/frigate:stable
shm_size: "850mb" # update for your cameras based on calculation above
devices:
#- /dev/bus/usb:/dev/bus/usb # passes the USB Coral, needs to be modified for other versions
- /dev/apex_0:/dev/apex_0 # passes a PCIe Coral, follow driver instructions here https://coral.ai/docs/m2/get-started/#2a-on-linux
- /dev/apex_1:/dev/apex_1
#- /dev/dri/renderD128 # for intel hwaccel, needs to be updated for your hardware
volumes:
- /etc/localtime:/etc/localtime:ro
- /home/user/Software/Scripts/docker/frigate/config/frigate.yml:/config/config.yml
- /home/user/Software/Scripts/docker/frigate/config/go2rtc:/config/go2rtc
- /mnt/CamFootage:/media/frigate
- /home/user/Software/Scripts/docker/frigate:/db
- type: tmpfs # Optional: 1GB of memory, reduces SSD/SD Card wear
target: /tmp/cache
tmpfs:
size: 1000000000
networks:
- enp8s0
ports:
- "5001:5000"
- "1935:1935" # RTMP feeds
- "8554:8554" # RTSP feeds
- "8555:8555/tcp" # WebRTC over tcp
- "8555:8555/udp" # WebRTC over udp
environment:
FRIGATE_RTSP_PASSWORD: "password"
networks:
enp8s0:
Frigate logs
2024-03-26 01:50:01.995727962 [INFO] Preparing Frigate...
2024-03-26 01:50:01.996292271 [INFO] Starting NGINX...
2024-03-26 01:50:01.998032480 [INFO] Preparing new go2rtc config...
s6-rc: info: service legacy-services successfully started
2024-03-26 01:50:02.002815522 [INFO] Starting Frigate...
2024-03-26 01:50:02.190391428 [INFO] Starting go2rtc...
2024-03-26 01:50:02.232511640 01:50:02.232 INF go2rtc version 1.8.4 linux/amd64
2024-03-26 01:50:02.232851879 01:50:02.232 INF [rtsp] listen addr=:8554
2024-03-26 01:50:02.232879651 01:50:02.232 INF [api] listen addr=:1984
2024-03-26 01:50:02.232996601 01:50:02.232 INF [webrtc] listen addr=:8555
2024-03-26 01:50:02.752193839 [2024-03-26 01:50:02] frigate.app INFO : Starting Frigate (0.13.2-6476f8a)
2024-03-26 01:50:02.824835738 [2024-03-26 01:50:02] peewee_migrate.logs INFO : Starting migrations
2024-03-26 01:50:02.827668738 [2024-03-26 01:50:02] peewee_migrate.logs INFO : There is nothing to migrate
2024-03-26 01:50:02.833159370 [2024-03-26 01:50:02] frigate.app INFO : Recording process started: 729
2024-03-26 01:50:02.834944512 [2024-03-26 01:50:02] frigate.app INFO : go2rtc process pid: 89
2024-03-26 01:50:02.856538296 [2024-03-26 01:50:02] detector.coral1 INFO : Starting detection process: 739
2024-03-26 01:50:02.862384796 [2024-03-26 01:50:02] detector.coral2 INFO : Starting detection process: 744
2024-03-26 01:50:02.863193935 [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO : Attempting to load TPU as pci:0
2024-03-26 01:50:02.866746978 [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO : TPU found
2024-03-26 01:50:02.866967582 [2024-03-26 01:50:02] frigate.app INFO : Output process started: 761
2024-03-26 01:50:02.882466830 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera1: 768
2024-03-26 01:50:02.888466686 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera2: 770
2024-03-26 01:50:02.894560801 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera3: 771
2024-03-26 01:50:02.900379829 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera4: 773
2024-03-26 01:50:02.906395507 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera5: 776
2024-03-26 01:50:02.918211517 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for camera6: 778
2024-03-26 01:50:02.919196908 [2024-03-26 01:50:02] frigate.app INFO : Camera processor started for doorcam: 781
2024-03-26 01:50:02.925658092 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera1: 783
2024-03-26 01:50:02.931928107 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera2: 789
2024-03-26 01:50:02.938740020 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera3: 795
2024-03-26 01:50:02.944340839 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera4: 800
2024-03-26 01:50:02.950290741 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera5: 806
2024-03-26 01:50:02.957319010 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for camera6: 827
2024-03-26 01:50:02.963700124 [2024-03-26 01:50:02] frigate.app INFO : Capture process started for doorcam: 830
2024-03-26 01:50:11.998523966 [INFO] Starting go2rtc healthcheck service...
2024-03-26 01:50:15.730801593 [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO : Attempting to load TPU as pci:1
2024-03-26 01:50:15.730965721 [2024-03-26 01:50:15] frigate.detectors.plugins.edgetpu_tfl ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2024-03-26 01:50:15.730982292 Process detector:coral2:
2024-03-26 01:50:15.732237840 Traceback (most recent call last):
2024-03-26 01:50:15.732251045 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2024-03-26 01:50:15.732251827 delegate = Delegate(library, options)
2024-03-26 01:50:15.732252678 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2024-03-26 01:50:15.732255824 raise ValueError(capture.message)
2024-03-26 01:50:15.732264821 ValueError
2024-03-26 01:50:15.732280571
2024-03-26 01:50:15.732281462 During handling of the above exception, another exception occurred:
2024-03-26 01:50:15.732281993
2024-03-26 01:50:15.732282755 Traceback (most recent call last):
2024-03-26 01:50:15.732303764 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-03-26 01:50:15.732304536 self.run()
2024-03-26 01:50:15.732305658 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-03-26 01:50:15.732307110 self._target(*self._args, **self._kwargs)
2024-03-26 01:50:15.732307882 File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-03-26 01:50:15.732309405 object_detector = LocalObjectDetector(detector_config=detector_config)
2024-03-26 01:50:15.732328400 File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-03-26 01:50:15.732329753 self.detect_api = create_detector(detector_config)
2024-03-26 01:50:15.732330735 File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-03-26 01:50:15.732331336 return api(detector_config)
2024-03-26 01:50:15.732332117 File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 41, in __init__
2024-03-26 01:50:15.732332929 edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2024-03-26 01:50:15.732333901 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2024-03-26 01:50:15.732364258 raise ValueError('Failed to load delegate from {}\n{}'.format(
2024-03-26 01:50:15.732365560 ValueError: Failed to load delegate from libedgetpu.so.1.0
2024-03-26 01:50:15.732366051
2024-03-26 01:50:23.098926324 [2024-03-26 01:50:23] frigate.watchdog INFO : Detection appears to have stopped. Exiting Frigate...
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service nginx: stopping
s6-rc: info: service go2rtc-healthcheck: stopping
2024-03-26 01:50:23.115132162 [INFO] The go2rtc-healthcheck service exited with code 256 (by signal 15)
s6-rc: info: service go2rtc-healthcheck successfully stopped
2024-03-26 01:50:23.252760402 [INFO] Service NGINX exited with code 0 (by signal 0)
s6-rc: info: service nginx successfully stopped
s6-rc: info: service nginx-log: stopping
s6-rc: info: service frigate: stopping
2024-03-26 01:50:23.258859999 [2024-03-26 01:50:23] frigate.app INFO : Stopping...
s6-rc: info: service nginx-log successfully stopped
2024-03-26 01:50:23.259005072 [2024-03-26 01:50:23] root INFO : Waiting for detection process to exit gracefully...
2024-03-26 01:50:23.259067950 [2024-03-26 01:50:23] frigate.stats INFO : Exiting stats emitter...
2024-03-26 01:50:23.259199507 [2024-03-26 01:50:23] frigate.watchdog INFO : Exiting watchdog...
2024-03-26 01:50:23.259248820 [2024-03-26 01:50:23] frigate.ptz.autotrack INFO : Exiting autotracker...
2024-03-26 01:50:23.259361050 [2024-03-26 01:50:23] frigate.storage INFO : Exiting storage maintainer...
2024-03-26 01:50:23.259403901 [2024-03-26 01:50:23] frigate.record.cleanup INFO : Exiting recording cleanup...
2024-03-26 01:50:23.259478781 [2024-03-26 01:50:23] frigate.events.cleanup INFO : Exiting event cleanup...
2024-03-26 01:50:23.260938894 [2024-03-26 01:50:23] detector.coral1 INFO : Signal to exit detection process...
2024-03-26 01:50:23.263547072 Fatal Python error: Segmentation fault
If i comment out in the frigate config - /dev/apex_1:/dev/apex_1
and restart frigate container it runs and stops rebooting and dmesg stops reporting
[ 115.671430] apex 0000:06:00.0: Error in device open cb: -110
I've removed the adapter and checked its seated well and no dust and reinserted it to the pci port
CPU: Ryzen 7 5700G Motherboard: B550M Steel Legend GPU: Onboard OS: Linux Mint 21.3 Virginia
Hi @duindain Could you please try/tell:
- comment out apex_0 instead of apex_1 to see if there's any difference?
- do you have a heatsink for TPUs?
In docker passing through any of these works fine individually when the frigate config is only using pcie:0
- /dev/apex_0:/dev/apex_1
- /dev/apex_1:/dev/apex_0
- /dev/apex_1:/dev/apex_1
- /dev/apex_0:/dev/apex_0
If i set the frigate config to use pcie:1 it fails
I don't have a heatsink atm, i can add one
@duindain please try it with heatsink, as it's needed anyways. If it won't help, we'll consider adapter replacement
I've put a passive heat sink on with some thermal joining pad, its definitely not high quality but the case is well ventilated, has a 120mm fan and its fairly cool here atm 14-20c ambient
I'm not sure if this is accurate or how you are meant to check (There didnt seem to be much info out there) but i get this values
When passing through just apex_0 from docker and when passing through both cat /sys/class/apex/apex_0/temp 48300 in a range so 46-48 degrees c cat /sys/class/apex/apex_1/temp -89700 this seems to always return this number
I assume the -89700 is because its not being used? or from just not running
I've tried a few combinations but apex_1 always seems to return that -89700 regardless
The temp drops a bit when i configure frigate to use both tpus presumably because its spending all its time rebooting and not actually sending anything to be processed
@duindain feels like something's wrong with either TPU card or adapter itself. If you can't inspect flipchips on your TPU card with microscope or try another card, we can try to replace adapter
@magic-blue-smoke unfortunately the best i have is a magnifying lens and i cant see anything looking broken or badly soldered, I don't have another card to try
@duindain we can try adapter board replacement. Could you contact me using a contact form at the bottom of the page?
ty, i"ve sent a message with order details and other info
I've received the new adapter unfortunately the coral is behaving the same as before with one temperature sensor reporting an out of bounds value -89700 If i enable both tpus frigate continually crashes as before but i can enable one fine
I've sent an RMA request for the coral Is there anything else to try at this point?
Mouser rejected the warranty request
curious my TPU is also showing a negative temp value and crashing my whole system Coral Temp2: -89.70C