Dual-Edge-TPU-Adapter icon indicating copy to clipboard operation
Dual-Edge-TPU-Adapter copied to clipboard

Coral M.2 Accelerator dual edge tpu with Dual Edge TPU Adapter - PCIe x1 Low Profile only one tpu working

Open duindain opened this issue 3 months ago • 10 comments

Hi,

Hoping someone can help diagnose this

I've bought a m.2 dual edge accelerator and an adapter from makerfab

I've just got all the cameras working and running frigate and I'm getting constant reboots of frigate saying it can't find one of the tpus

I'm running frigate in a docker container

dmesg looks like its reporting an error from the adapter/accelerator possibly?

ls -l /dev/apex*
crw-rw---- 1 root apex 120, 0 Mar 25 18:23 /dev/apex_0
crw-rw---- 1 root apex 120, 1 Mar 25 18:23 /dev/apex_1
ls /sys/class/apex/
apex_0  apex_1
dmesg | grep apex
[   35.356036] apex 0000:05:00.0: enabling device (0000 -> 0002)
[   35.371387] apex 0000:06:00.0: enabling device (0000 -> 0002)
[   40.512237] apex 0000:05:00.0: Apex performance not throttled due to temperature
[   48.172225] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[   53.312236] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   58.432237] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   63.552236] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   68.672243] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   83.300290] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[   83.300301] apex 0000:06:00.0: Error in device open cb: -110
[   83.300315] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   88.384283] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   93.504246] apex 0000:06:00.0: Apex performance not throttled due to temperature
[   98.628059] apex 0000:06:00.0: Apex performance not throttled due to temperature
[  115.671423] apex 0000:06:00.0: RAM did not enable within timeout (12000 ms)
[  115.671430] apex 0000:06:00.0: Error in device open cb: -110
[  115.671442] apex 0000:06:00.0: Apex performance not throttled due to temperature
[  120.895350] apex 0000:06:00.0: Apex performance not throttled due to temperature
[  126.015273] apex 0000:06:00.0: Apex performance not throttled due to temperature
[  131.135234] apex 0000:06:00.0: Apex performance not throttled due to temperature

frigate docker compose file

version: "3.9"
services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:stable
    shm_size: "850mb" # update for your cameras based on calculation above
    devices:
      #- /dev/bus/usb:/dev/bus/usb # passes the USB Coral, needs to be modified for other versions
      - /dev/apex_0:/dev/apex_0 # passes a PCIe Coral, follow driver instructions here https://coral.ai/docs/m2/get-started/#2a-on-linux
      - /dev/apex_1:/dev/apex_1
      #- /dev/dri/renderD128 # for intel hwaccel, needs to be updated for your hardware
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /home/user/Software/Scripts/docker/frigate/config/frigate.yml:/config/config.yml
      - /home/user/Software/Scripts/docker/frigate/config/go2rtc:/config/go2rtc
      - /mnt/CamFootage:/media/frigate
      - /home/user/Software/Scripts/docker/frigate:/db
      - type: tmpfs # Optional: 1GB of memory, reduces SSD/SD Card wear
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    networks:
      - enp8s0
    ports:
      - "5001:5000"
      - "1935:1935" # RTMP feeds
      - "8554:8554" # RTSP feeds
      - "8555:8555/tcp" # WebRTC over tcp
      - "8555:8555/udp" # WebRTC over udp
    environment:
      FRIGATE_RTSP_PASSWORD: "password"

networks:
  enp8s0:

Frigate logs

2024-03-26 01:50:01.995727962  [INFO] Preparing Frigate...
2024-03-26 01:50:01.996292271  [INFO] Starting NGINX...
2024-03-26 01:50:01.998032480  [INFO] Preparing new go2rtc config...
s6-rc: info: service legacy-services successfully started
2024-03-26 01:50:02.002815522  [INFO] Starting Frigate...
2024-03-26 01:50:02.190391428  [INFO] Starting go2rtc...
2024-03-26 01:50:02.232511640  01:50:02.232 INF go2rtc version 1.8.4 linux/amd64
2024-03-26 01:50:02.232851879  01:50:02.232 INF [rtsp] listen addr=:8554
2024-03-26 01:50:02.232879651  01:50:02.232 INF [api] listen addr=:1984
2024-03-26 01:50:02.232996601  01:50:02.232 INF [webrtc] listen addr=:8555
2024-03-26 01:50:02.752193839  [2024-03-26 01:50:02] frigate.app                    INFO    : Starting Frigate (0.13.2-6476f8a)
2024-03-26 01:50:02.824835738  [2024-03-26 01:50:02] peewee_migrate.logs            INFO    : Starting migrations
2024-03-26 01:50:02.827668738  [2024-03-26 01:50:02] peewee_migrate.logs            INFO    : There is nothing to migrate
2024-03-26 01:50:02.833159370  [2024-03-26 01:50:02] frigate.app                    INFO    : Recording process started: 729
2024-03-26 01:50:02.834944512  [2024-03-26 01:50:02] frigate.app                    INFO    : go2rtc process pid: 89
2024-03-26 01:50:02.856538296  [2024-03-26 01:50:02] detector.coral1                INFO    : Starting detection process: 739
2024-03-26 01:50:02.862384796  [2024-03-26 01:50:02] detector.coral2                INFO    : Starting detection process: 744
2024-03-26 01:50:02.863193935  [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci:0
2024-03-26 01:50:02.866746978  [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2024-03-26 01:50:02.866967582  [2024-03-26 01:50:02] frigate.app                    INFO    : Output process started: 761
2024-03-26 01:50:02.882466830  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera1: 768
2024-03-26 01:50:02.888466686  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera2: 770
2024-03-26 01:50:02.894560801  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera3: 771
2024-03-26 01:50:02.900379829  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera4: 773
2024-03-26 01:50:02.906395507  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera5: 776
2024-03-26 01:50:02.918211517  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for camera6: 778
2024-03-26 01:50:02.919196908  [2024-03-26 01:50:02] frigate.app                    INFO    : Camera processor started for doorcam: 781
2024-03-26 01:50:02.925658092  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera1: 783
2024-03-26 01:50:02.931928107  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera2: 789
2024-03-26 01:50:02.938740020  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera3: 795
2024-03-26 01:50:02.944340839  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera4: 800
2024-03-26 01:50:02.950290741  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera5: 806
2024-03-26 01:50:02.957319010  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for camera6: 827
2024-03-26 01:50:02.963700124  [2024-03-26 01:50:02] frigate.app                    INFO    : Capture process started for doorcam: 830
2024-03-26 01:50:11.998523966  [INFO] Starting go2rtc healthcheck service...
2024-03-26 01:50:15.730801593  [2024-03-26 01:50:02] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci:1
2024-03-26 01:50:15.730965721  [2024-03-26 01:50:15] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2024-03-26 01:50:15.730982292  Process detector:coral2:
2024-03-26 01:50:15.732237840  Traceback (most recent call last):
2024-03-26 01:50:15.732251045    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2024-03-26 01:50:15.732251827      delegate = Delegate(library, options)
2024-03-26 01:50:15.732252678    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2024-03-26 01:50:15.732255824      raise ValueError(capture.message)
2024-03-26 01:50:15.732264821  ValueError
2024-03-26 01:50:15.732280571
2024-03-26 01:50:15.732281462  During handling of the above exception, another exception occurred:
2024-03-26 01:50:15.732281993
2024-03-26 01:50:15.732282755  Traceback (most recent call last):
2024-03-26 01:50:15.732303764    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-03-26 01:50:15.732304536      self.run()
2024-03-26 01:50:15.732305658    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-03-26 01:50:15.732307110      self._target(*self._args, **self._kwargs)
2024-03-26 01:50:15.732307882    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-03-26 01:50:15.732309405      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-03-26 01:50:15.732328400    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-03-26 01:50:15.732329753      self.detect_api = create_detector(detector_config)
2024-03-26 01:50:15.732330735    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-03-26 01:50:15.732331336      return api(detector_config)
2024-03-26 01:50:15.732332117    File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 41, in __init__
2024-03-26 01:50:15.732332929      edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2024-03-26 01:50:15.732333901    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2024-03-26 01:50:15.732364258      raise ValueError('Failed to load delegate from {}\n{}'.format(
2024-03-26 01:50:15.732365560  ValueError: Failed to load delegate from libedgetpu.so.1.0
2024-03-26 01:50:15.732366051
2024-03-26 01:50:23.098926324  [2024-03-26 01:50:23] frigate.watchdog               INFO    : Detection appears to have stopped. Exiting Frigate...
s6-rc: info: service legacy-services: stopping
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service nginx: stopping
s6-rc: info: service go2rtc-healthcheck: stopping
2024-03-26 01:50:23.115132162  [INFO] The go2rtc-healthcheck service exited with code 256 (by signal 15)
s6-rc: info: service go2rtc-healthcheck successfully stopped
2024-03-26 01:50:23.252760402  [INFO] Service NGINX exited with code 0 (by signal 0)
s6-rc: info: service nginx successfully stopped
s6-rc: info: service nginx-log: stopping
s6-rc: info: service frigate: stopping
2024-03-26 01:50:23.258859999  [2024-03-26 01:50:23] frigate.app                    INFO    : Stopping...
s6-rc: info: service nginx-log successfully stopped
2024-03-26 01:50:23.259005072  [2024-03-26 01:50:23] root                           INFO    : Waiting for detection process to exit gracefully...
2024-03-26 01:50:23.259067950  [2024-03-26 01:50:23] frigate.stats                  INFO    : Exiting stats emitter...
2024-03-26 01:50:23.259199507  [2024-03-26 01:50:23] frigate.watchdog               INFO    : Exiting watchdog...
2024-03-26 01:50:23.259248820  [2024-03-26 01:50:23] frigate.ptz.autotrack          INFO    : Exiting autotracker...
2024-03-26 01:50:23.259361050  [2024-03-26 01:50:23] frigate.storage                INFO    : Exiting storage maintainer...
2024-03-26 01:50:23.259403901  [2024-03-26 01:50:23] frigate.record.cleanup         INFO    : Exiting recording cleanup...
2024-03-26 01:50:23.259478781  [2024-03-26 01:50:23] frigate.events.cleanup         INFO    : Exiting event cleanup...
2024-03-26 01:50:23.260938894  [2024-03-26 01:50:23] detector.coral1                INFO    : Signal to exit detection process...
2024-03-26 01:50:23.263547072  Fatal Python error: Segmentation fault

If i comment out in the frigate config - /dev/apex_1:/dev/apex_1 and restart frigate container it runs and stops rebooting and dmesg stops reporting [ 115.671430] apex 0000:06:00.0: Error in device open cb: -110

I've removed the adapter and checked its seated well and no dust and reinserted it to the pci port

CPU: Ryzen 7 5700G Motherboard: B550M Steel Legend GPU: Onboard OS: Linux Mint 21.3 Virginia

duindain avatar Mar 27 '24 05:03 duindain