depthai-core icon indicating copy to clipboard operation
depthai-core copied to clipboard

[gen2-multiple-devices] Ping was missed, closing the device connection

Open miknai opened this issue 2 years ago • 32 comments

Luxonis team,

I currently have three POE cameras (2 - OAK-D-POE and 1 - OAK-D-POE-PRO). At an office, I was not able to run the main.py of gen2-multiple-devices. Whenever I run the script, one of three camera gives me "ping was missed, closing the device connection" error. The camera giving me this error changes every run, so it's not happening from one camera.

This is the code I use (main.py): https://github.com/luxonis/depthai-experiments/blob/master/gen2-multiple-devices/main.py

I had a chance to talk with Erik through Discord channel. With his advice, I updated the boot-loader for all three camera, updated depthai version, and even factory reset-ed the three camera. The issue persists. I swapped the ethernet switch with another one. Also, connected one camera through an ethernet injector. The issue persists. I brought the three camera (plus ethernet switch) to home and ran the script. It displayed the video stream for 10 seconds and then gave me the disconnection.

I think I have tried everything I can do at this point, but still not sure what causes this issue. I purchased another 8 OAK-D-POE-PRO units recently and I really wish to know how to deal with this error. Could you please help me out?

miknai avatar Aug 16 '22 21:08 miknai

Here's the terminal output.

Found 3 devices
=== Connected to 18443010D1F77B0E00
   >>> MXID: 18443010D1F77B0E00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D15EE00F00
   >>> MXID: 18443010D15EE00F00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
=== Connected to 184430106192F30F00
   >>> MXID: 184430106192F30F00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
[2022-08-16 13:38:26.212] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
  File "test-multiple-device.py", line 73, in <module>
    in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'

This time the camera, 18443010D1F77B0E00 (MxId) was disconnected, but it happens to all three cameras.

Here's another terminal output. The video stream windows were up for 10 seconds and then the program got terminated due to the connection drop.

Found 3 devices
=== Connected to 184430106192F30F00
   >>> MXID: 184430106192F30F00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D15EE00F00
   >>> MXID: 18443010D15EE00F00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D1F77B0E00
   >>> MXID: 18443010D1F77B0E00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
[2022-08-16 16:16:32.443] [warning] Monitor thread (device: 184430106192F30F00 [10.0.0.107]) - ping was missed, closing the device connection
Traceback (most recent call last):
  File "test-multiple-device.py", line 73, in <module>
    in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'

miknai avatar Aug 16 '22 21:08 miknai

Hi @miknai , What version of depthai (latest: 2.17.3) are you using, and what version of the bootlaoder (latest: 0.20) are your OAK cameras? If they aren't the latest, I would suggest updating them first (python3 -mpip install depthai -U for depthai, device manager for bootlaoder). Thanks, Erik

Erol444 avatar Aug 16 '22 22:08 Erol444

All of them are the latest ones. I have tried the factory reset via network (through device_manager.py UI) as well. Screenshot from 2022-08-16 16-07-41

miknai avatar Aug 16 '22 23:08 miknai

I tried running with only two camera this time (1-OAK-D-POE and 1-OAK-D-POE-PRO). I physically disconnected the other camera. The program displayed two video stream windows as expected, worked fine for about 4 minutes, and then got the "ping was missed..." error.

Found 2 devices
=== Connected to 18443010D1F77B0E00
   >>> MXID: 18443010D1F77B0E00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
=== Connected to 184430106192F30F00
   >>> MXID: 184430106192F30F00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
[2022-08-16 16:36:09.964] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
  File "test-multiple-device.py", line 73, in <module>
    in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'

miknai avatar Aug 16 '22 23:08 miknai

Tried running with only one camera this time (1-OAK-D-POE-PRO). It lasted 15 mins and got crushed due to the same error.

Found 1 devices
=== Connected to 18443010D1F77B0E00
   >>> MXID: 18443010D1F77B0E00
   >>> Cameras: RGB LEFT RIGHT
   >>> USB speed: UNKNOWN
   >>> Loading pipeline for: OAK-D-POE
[2022-08-16 17:18:01.339] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
  File "test-multiple-device.py", line 73, in <module>
    in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'

miknai avatar Aug 17 '22 00:08 miknai

In summary, 3 cameras => lasted 10 seconds 2 cameras => lasted 4 minutes 1 camera => lasted 15 mins

I wanted to use one camera as surveillance monitoring but it seems I cannot do that due to the connection drop... Is the camera designed to be running 24/7 as long as the power is supplied?

miknai avatar Aug 17 '22 00:08 miknai

Any updates on this? Is this issue not reproducible on your side?

miknai avatar Aug 23 '22 16:08 miknai

@miknai Sorry about the delay on our side, let me double-check with the team.

Erol444 avatar Aug 26 '22 11:08 Erol444

Hi guys, I'm experiencing the same issue. This is depthai 2.17.3.0.dev+dev, running on a raspberry pi 4, on a docker container with base image luxonis/depthai-library:v2.17.3.0-armv7 I run 2 oak-1-poe simultaneously from the same raspi, and the problem happens on both cameras. They had firmware 0.018, but the problem persists after updating both of them to 0.0.20. This is the error I see: [2022-08-27 12:11:06.162] [#033[33m#033[1mwarning#033[m] Monitor thread (device: 184430101173DE0F00 [192.168.80.254]) - ping was missed, closing the device connection

lucasmediaflow avatar Aug 27 '22 00:08 lucasmediaflow

Here are some logs with debug enabled if it helps:

`[2022-08-27 15:34:45.036] [debug] Python bindings - version: 2.17.3.0.dev+dev from build: 2022-08-07 12:42:18 +0000 [2022-08-27 15:34:45.036] [debug] Library information - version: 2.17.3, commit: from , build: 2022-08-07 12:42:08 +0000 [2022-08-27 15:34:45.041] [debug] Initialize - finished DEPTHAI VERSION: 2.17.3.0.dev+dev
[1243.33760] finished setup 02-front oak
DEPTHAI VERSION: 2.17.3.0.dev+dev
[1243.33940] finished setup 01-back oak cams setup, starting them
[2022-08-27 15:34:45.063] [debug] Device - OpenVINO version: 2022.1 [2022-08-27 15:34:45.063] [debug] Device - OpenVINO version: 2022.1 [2022-08-27 15:34:45.063] [debug] Device - BoardConfig: {"emmc":null,"gpio":[],"logPath":null,"logSizeMax":null,"logVerbosity":null,"network":{"mtu":0,"xlinkTcpNoDelay":true},"pcieInternalClock":null,"sysctl":[],"uart":[],"usb":{"flashBootedPid":63037,"flashBootedVid":999 ,"maxSpeed":4,"pid":63035,"vid":999},"usb3PhyInternalClock":null,"watchdogInitialDelayMs":null,"watchdogTimeoutMs":null} libnop:
0000: b9 0d b9 05 81 e7 03 81 3b f6 81 e7 03 81 3d f6 04 b9 02 00 01 ba 00 be be bb 00 bb 00 be be be
0020: be be be [2022-08-27 15:34:45.064] [debug] Device - BoardConfig: {"emmc":null,"gpio":[],"logPath":null,"logSizeMax":null,"logVerbosity":null,"network":{"mtu":0,"xlinkTcpNoDelay":true},"pcieInternalClock":null,"sysctl":[],"uart":[],"usb":{"flashBootedPid":63037,"flashBootedVid":999 ,"maxSpeed":4,"pid":63035,"vid":999},"usb3PhyInternalClock":null,"watchdogInitialDelayMs":null,"watchdogTimeoutMs":null} libnop:
0000: b9 0d b9 05 81 e7 03 81 3b f6 81 e7 03 81 3d f6 04 b9 02 00 01 ba 00 be be bb 00 bb 00 be be be
0020: be be be [2022-08-27 15:34:45.381] [debug] Resources - Archive 'depthai-bootloader-fwp-0.0.20.tar.xz' open: 7ms, archive read: 334ms
[2022-08-27 15:34:46.423] [debug] Resources - Archive 'depthai-device-fwp-602822fe9eaca68a72c666497dc4979b29291b3e.tar.xz' open: 7ms, archive read: 1377ms
[2022-08-27 15:34:46.578] [debug] Searching for booted device: DeviceInfo(name=192.168.80.254, mxid=184430101173DE0F00, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only [2022-08-27 15:34:46.579] [debug] Searching for booted device: DeviceInfo(name=192.168.80.253, mxid=18443010D162DE0F00, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only [2022-08-27 15:34:46.821] [debug] Connected bootloader version 0.0.20 [2022-08-27 15:34:46.823] [debug] Connected bootloader version 0.0.20 [2022-08-27 15:35:05.678] [debug] Booting FW with Bootloader. Version 0.0.20, Time taken: 18855ms
[2022-08-27 15:35:05.678] [debug] DeviceBootloader about to be closed...
[2022-08-27 15:35:05.679] [debug] XLinkResetRemote of linkId: (0)
[2022-08-27 15:35:06.908] [debug] DeviceBootloader closed, 1229
[2022-08-27 15:35:07.013] [debug] Searching for booted device: DeviceInfo(name=192.168.80.254, mxid=184430101173DE0F00, X_LINK_BOOTED, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
[2022-08-27 15:35:07.495] [debug] Booting FW with Bootloader. Version 0.0.20, Time taken: 20673ms
[2022-08-27 15:35:07.495] [debug] DeviceBootloader about to be closed...
[2022-08-27 15:35:07.495] [debug] XLinkResetRemote of linkId: (1)
[2022-08-27 15:35:08.723] [debug] DeviceBootloader closed, 1227
[2022-08-27 15:35:08.832] [debug] Searching for booted device: DeviceInfo(name=192.168.80.253, mxid=18443010D162DE0F00, X_LINK_BOOTED, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only [184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Memory Usage - DDR: 0.12 / 340.93 MiB, CMX: 2.05 / 2.50 MiB, LeonOS Heap: 24.96 / 77.58 MiB, LeonRT Heap: 2.88 / 41.37 MiB
[184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Temperatures - Average: 28.86 °C, CSS: 30.32 °C, MSS 28.13 °C, UPA: 28.37 °C, DSS: 28.62 °C [184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Cpu Usage - LeonOS 6.86%, LeonRT: 0.56% [2022-08-27 15:35:10.578] [debug] Schema dump: {"connections":[{"node1Id":4,"node1Output":"bitstream","node1OutputGroup":"","node2Id":5,"node2Input":"in","node2InputGroup":""},{"node1Id":7,"node1Output":"out","node1OutputGroup":"","node2Id":0,"node2Input":"inputControl"," node2InputGroup":""},{"node1Id":0,"node1Output":"still","node1OutputGroup":"","node2Id":4,"node2Input":"in","node2InputGroup":""},{"node1Id":1,"node1Output":"bitstream","node1OutputGroup":"","node2Id":2,"node2Input":"in","node2InputGroup":""},{"node1Id":3,"node1Output":"o ut","node1OutputGroup":"","node2Id":1,"node2Input":"in","node2InputGroup":""},{"node1Id":0,"node1Output":"isp","node1OutputGroup":"","node2Id":3,"node2Input":"inputImage","node2InputGroup":""}],"globalProperties":{"calibData":null,"cameraTuningBlobSize":null,"cameraTuning BlobUri":"","leonCssFrequencyHz":700000000.0,"leonMssFrequencyHz":700000000.0,"pipelineName":null,"pipelineVersion":null,"xlinkChunkSize":-1},"nodes":[[0,{"id":0,"ioInfo":[[["","preview"],{"blocking":false,"group":"","name":"preview","queueSize":8,"type":0,"waitForMessage ":false}],[["","video"],{"blocking":false,"group":"","name":"video","queueSize":8,"type":0,"waitForMessage":false}],[["","still"],{"blocking":false,"group":"","name":"still","queueSize":8,"type":0,"waitForMessage":false}],[["","raw"],{"blocking":false,"group":"","name":"r aw","queueSize":8,"type":0,"waitForMessage":false}],[["","inputConfig"],{"blocking":false,"group":"","name":"inputConfig","queueSize":8,"type":3,"waitForMessage":false}],[["","isp"],{"blocking":false,"group":"","name":"isp","queueSize":8,"type":0,"waitForMessage":false}], [["","inputControl"],{"blocking":true,"group":"","name":"inputControl","queueSize":8,"type":3,"waitForMessage":false}]],"name":"ColorCamera","properties":[185,23,185,20,0,3,0,185,3,0,0,0,185,5,0,0,0,0,0,185,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,0,0,0,129,104,1,1 29,224,1,255,255,133,0,15,133,112,8,1,136,0,0,240,65,136,0,0,128,191,136,0,0,128,191,0,185,4,0,0,0,0,3,3,4,4,4]}],[1,{"id":1,"ioInfo":[[["","bitstream"],{"blocking":false,"group":"","name":"bitstream","queueSize":8,"type":0,"waitForMessage":false}],[["","in"],{"blocking": true,"group":"","name":"in","queueSize":4,"type":3,"waitForMessage":true}]],"name":"VideoEncoder","properties":[185,11,134,0,36,244,0,30,0,0,0,0,1,80,0,0,136,0,0,240,65]}],[2,{"id":2,"ioInfo":[[["","in"],{"blocking":true,"group":"","name":"in","queueSize":8,"type":3,"wait ForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,5,118,105,100,101,111,0]}],[3,{"id":3,"ioInfo":[[["","out"],{"blocking":false,"group":"","name":"out","queueSize":8,"type":0,"waitForMessage":false}],[["","inputConfig"],{"blocking":true,"group" :"","name":"inputConfig","queueSize":8,"type":3,"waitForMessage":false}],[["","inputImage"],{"blocking":true,"group":"","name":"inputImage","queueSize":8,"type":3,"waitForMessage":true}]],"name":"ImageManip","properties":[185,6,185,8,185,7,185,4,136,0,0,0,0,136,0,0,0,0,13 6,0,0,0,0,136,0,0,0,0,185,3,185,2,136,0,0,0,0,136,0,0,0,0,185,2,136,0,0,0,0,136,0,0,0,0,136,0,0,0,0,0,136,0,0,128,63,136,0,0,128,63,0,1,185,15,133,128,7,133,56,4,0,0,0,0,186,0,1,0,186,0,0,0,136,0,0,0,0,0,1,185,3,22,0,0,0,1,1,0,0,134,0,118,47,0,4,0,0,189,0]}],[4,{"id":4,"i oInfo":[[["","bitstream"],{"blocking":false,"group":"","name":"bitstream","queueSize":8,"type":0,"waitForMessage":false}],[["","in"],{"blocking":true,"group":"","name":"in","queueSize":1,"type":3,"waitForMessage":true}]],"name":"VideoEncoder","properties":[185,11,0,30,0,0 ,1,0,4,65,0,0,136,0,0,128,63]}],[5,{"id":5,"ioInfo":[[["","in"],{"blocking":true,"group":"","name":"in","queueSize":8,"type":3,"waitForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,3,106,112,103,0]}],[6,{"id":6,"ioInfo":[[["","in"],{"blocking" :true,"group":"","name":"in","queueSize":8,"type":3,"waitForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,7,112,114,101,118,105,101,119,0]}],[7,{"id":7,"ioInfo":[[["","out"],{"blocking":false,"group":"","name":"out","queueSize":8,"type":0,"wai tForMessage":false}]],"name":"XLinkIn","properties":[185,3,189,10,99,97,109,67,111,110,116,114,111,108,130,32,161,7,0,8]}]]} [2022-08-27 15:35:10.579] [debug] Asset map dump: {"map":{}} [184430101173DE0F00] [192.168.80.254] [657.594] [system] [info] ImageManip internal buffer size '188608'B, shave buffer size '35840'B [184430101173DE0F00] [192.168.80.254] [657.594] [system] [info] SIPP (Signal Image Processing Pipeline) internal buffer size '16384'B [184430101173DE0F00] [192.168.80.254] [657.629] [system] [info] ColorCamera allocated resources: no shaves; cmx slices: [10-15] ImageManip allocated resources: shaves: [15-15] no cmx slices.

[1269.03297] CONNECTED TO CAMERA: 01-back 184430101173DE0F00 [184430101173DE0F00] [192.168.80.254] [658.486] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB [184430101173DE0F00] [192.168.80.254] [658.486] [system] [info] Temperatures - Average: 31.65 °C, CSS: 32.97 °C, MSS 31.53 °C, UPA: 30.56 °C, DSS: 31.53 °C [184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB [184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Temperatures - Average: 29.53 °C, CSS: 31.04 °C, MSS 29.35 °C, UPA: 28.86 °C, DSS: 28.86 °C [184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Cpu Usage - LeonOS 14.82%, LeonRT: 1.38% [2022-08-27 15:39:45.531] [warning] Monitor thread (device: 18443010D162DE0F00 [192.168.80.253]) - ping was missed, closing the device connection connected cameras: [<CameraBoardSocket.RGB: 0>]
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None [1544.04101] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e4e2ff3825b911ed9489dca632575d7a'}
[2022-08-27 15:39:45.761] [debug] DataOutputQueue (preview) closed
[2022-08-27 15:39:45.761] [debug] XLinkResetRemote of linkId: (2) FFMPEG PIPE ERROR write to closed file
ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n'
[2022-08-27 15:39:45.762] [debug] DataOutputQueue (video) closed
VIDEOFRAME DATA: 0:25:32.589992 7866 52825
[2022-08-27 15:39:45.763] [debug] DataOutputQueue (jpg) closed
[2022-08-27 15:39:45.765] [debug] Log thread exception caught: Couldn't read data from stream: '__log' (X_LINK_ERROR)
[2022-08-27 15:39:45.765] [debug] Timesync thread exception caught: Couldn't read data from stream: '__timesync' (X_LINK_ERROR)
[2022-08-27 15:39:45.767] [debug] Device about to be closed...
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None
[1544.06524] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e4e6b25425b911ed83e4dca632575d7a'} FFMPEG PIPE ERROR write to closed file ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n' VIDEOFRAME DATA: 0:25:25.552082 7720 56995
connected cameras: [<CameraBoardSocket.RGB: 0>]
[184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB [184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Temperatures - Average: 30.80 °C, CSS: 31.77 °C, MSS 31.04 °C, UPA: 29.83 °C, DSS: 30.56 °C [184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Cpu Usage - LeonOS 15.87%, LeonRT: 1.35%
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None
[1544.27472] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e506ab1825b911ed83e4dca632575d7a'} FFMPEG PIPE ERROR write to closed file ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n'
VIDEOFRAME DATA: 0:25:25.751384 7726 64363
[2022-08-27 15:39:46.510] [debug] Device closed, 743
[2022-08-27 15:39:46.511] [debug] DataInputQueue (camControl) closed
Exception in thread 18443010D162DE0F00:
Traceback (most recent call last):
File "/home/pi/camera/utils.py", line 523, in startOak self.ffmpeg.stdin.write(framedata) ValueError: write to closed file

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner self.run() File "/usr/local/lib/python3.9/threading.py", line 910, in run self._target(*self._args, **self._kwargs) File "/home/pi/camera/utils.py", line 531, in startOak self.ffmpegErrorHandler(error, videoFrame) File "/home/pi/camera/utils.py", line 340, in ffmpegErrorHandler print("connected cameras:", self.device.getConnectedCameras()) ValueError: Device already closed or disconnected connected cameras: [<CameraBoardSocket.RGB: 0>] memory used: 213035008 remaining: 144453119 total: 357488127 is device closed: None`

lucasmediaflow avatar Aug 27 '22 03:08 lucasmediaflow

Thanks for the reports on this issue - we'll be looking into this in more details and will post back as soon as we get more information on it.

themarpe avatar Aug 29 '22 12:08 themarpe

Is there a good way to prove that this is not caused by insufficient power?

miknai avatar Aug 29 '22 23:08 miknai

@miknai as the timings degrade with multiple cameras I highly suspect this not being a power supply issue (otherwise it should either be okay or start failing after more cameras, if POE switch wouldn't be able to supply enough power)

themarpe avatar Aug 30 '22 15:08 themarpe

@miknai Could you run a test with an environment variable set: DEPTHAI_WATCHDOG=0 , to see if it improves the stability.

On Linux, to not have to export, can prepend it to the command being ran: depthai-experiments/gen2-multiple-devices$ DEPTHAI_WATCHDOG=0 python3 main.py


A note for DEPTHAI_WATCHDOG=0, it should work fine with the above script, but on the latest release (v2.17.3.1) it may cause issues with pipelines that load assets (like NeuralNetwork blobs, StereoDepth undistortion mesh, etc). Recently we fixed this issue on the develop branch.

alex-luxonis avatar Aug 30 '22 19:08 alex-luxonis

Okay I will try that within a couple of hours and share the results here. => Sorry I haven't tried the watchdog trick yet. I will do that tomorrow. => With the disabled watchdog, I was able to run the multi-device video stream without the connection drop. For some cameras, the streaming became really laggy quite often like 2 frames per second or even less frames.

miknai avatar Aug 31 '22 21:08 miknai

Hi! I've run into this same issue, but consistently only effecting one camera (using static IPs). I'm also able to run all my cameras (currently 3) with no issues if I run them in completely separate terminals. My system previously worked on older depthai/bootloader versions (2.15.4.0/0.0.15) and has been having this issue since I flashed the newest bootloader version (0.0.20) yesterday (persists even after factory reset).

The watchdog fix worked on my system, both using this script and a custom one.

aquartaro avatar Aug 31 '22 21:08 aquartaro

On my end, even though I run three cameras on separate terminals, I still get the connection drop from the two OAK-D-POE units. I haven't tried the watchdog trick yet.

miknai avatar Sep 02 '22 05:09 miknai

When I mention separate terminals I mean a process like:

  1. no Cameras
  2. plug in problem camera and run the multi-device main.py
  3. plug in the other two and then run the script again in a different terminal This has no issues and I've run it for more than 15min a couple times before manually stopping both windows.

If I have all three cameras viewable by Device.getAllAvailableDevices(), but only try to add a camera that presents the ping issue (I'm using device = dai.Device(pipeline, device_info) ), it will crash with the same error pretty quickly after Device() is called.

Currently I'm not doing anything with the cameras besides streaming RGB images on v2.17.3.1 and have not seen any issues with the watchdog env variable fix

aquartaro avatar Sep 02 '22 16:09 aquartaro

With the disabled watchdog, I was able to run the multi-device video stream without the connection drop. Ran it with all three cameras together for about 30 mins. For some cameras, the streaming became really laggy quite often like 2 frames per second or even less frames. But, at least, I don't lose connection.

What does watchdog do in the system and why did disabling it make difference?

miknai avatar Sep 02 '22 19:09 miknai

@miknai Do you mind testing out latest develop as well? It was likely an issue with how PoE devices had the WD timing configured. We've tweaked it now and should work better without changing the watchdog manually (or disabling it with that said)

Also, if that doesn't work, do you mind comparing it with the env var set to DEPTHAI_WATCHDOG=60000?

themarpe avatar Sep 05 '22 17:09 themarpe

@themarpe What does WD stand for in WD timing?

Okay, I can do it by tomorrow.

miknai avatar Sep 08 '22 17:09 miknai

@themarpe When you say the latest develop branch, it's the one in this repo, correct? As I mainly use Python, I don't use this repo much. The repo I initially opened this issue was depthai-experiement.

Oh wait, I believe the depthai-python repo is the wrapper of this depthai-core, so the changes in this repo make changes in depthai-python repo as well. Then, how the depthai-experiment repo gets affected by depthai-core? I don't see anything like "depthai-core @ commit id" in depthai-experiment repo.

How do I know that the gen2-multiple-devices runs with the code changes you made in depthai-core repo?

miknai avatar Sep 09 '22 05:09 miknai

@miknai WD stands for watchdog.

Go under depthai-python, checkout develop branch and run python3 examples/install_dependencies.py then run the gen2-multiple-devices experiment (without installing requirements specified in that experiment)

themarpe avatar Sep 09 '22 08:09 themarpe

@themarpe, thank you for providing me the instruction to follow.

I encountered this issue with the develop branch.

Screenshot from 2022-09-09 17-29-09

When I run with DEPTHAI_WATCHDOG=60000, the program opens up three windows (I have three cameras), but only 1 unit (oak-d-poe-pro) streams the video properly. Other two units (oak-d-poe) freezes as shown in the this video.

https://user-images.githubusercontent.com/58019428/189461804-1b2ca1d9-8f9f-447a-9034-b4f94a65635d.mp4

miknai avatar Sep 10 '22 00:09 miknai

@miknai On the same branch/version, if you run with DEPTHAI_WATCHDOG=0, are all cameras working well, no freezing?

alex-luxonis avatar Sep 10 '22 00:09 alex-luxonis

@alex-luxonis If I run with DEPTHAI_WATCHDOG=0, all cameras work but 2 of them are extremely slow. In the below video, I am waving constantly in front of all three cameras, but only 1 unit (oak-d-poe) works properly.

https://user-images.githubusercontent.com/58019428/189462646-ae559591-202e-4abc-b04d-af3cbcbd5fd1.mp4

miknai avatar Sep 10 '22 01:09 miknai

@miknai Thanks for testing, this makes sense. It looks like the bandwidth available in your setup is limited, for example due to WiFi, or 100Mbps Ethernet links, and severe congestion happens for some of the TCP streams.

In its current form, gen2-multiple-devices/main.py requires around 409Mbps of bandwidth for 3x OAK-D-PoE devices. The RGB camera of each is configured to stream uncompressed images, consuming each: 600*300 (w*h) * 3 (Bpp,RGB) * 30 (fps) * 8 (bits) = 130 Mbps for video payload, plus some extra overhead (TCP/IP, etc)

If the communication is completely stalled for several seconds, as it appears from the above video, the device watchdog would kick in and reset it. That's likely what happens with the value 60000ms, that gets capped to 5000ms due to some hardware constraints, and it's insufficient for the above scenario. (We should revisit that logic in firmware, and auto-feed the watchdog for values larger than 5000.)

On my side, testing with 3x OAK-D-PoE devices and the same app, they all stream smoothly and with no lag, over a 1Gbps Ethernet link from the PoE switch to the test PC. You can try a similar setup, or edit the script to reduce the preview size for a test. Ideally, VideoEncoder should be used to compress to JPEG/H.264/H.265 for such limited bandwidth cases.

But we should also fix the crash/hang in the first place, in case network congestion still happens due to various reasons.

alex-luxonis avatar Sep 10 '22 01:09 alex-luxonis

@alex-luxonis Thank you so much for sharing the detailed info. It makes much more sense with the calculation. The wifi speed was 78 Mbps. Only one camera works well with this setup. Now, I am trying with ethernet cable which enables me to have 350 Mbps, and two cameras work well in this setup. Makes sense with the calculation that 2x OAK-D-POE requires 260 + extra.

Questions:

  • How did you measure that "gen2-multiple-devices/main.py requires around 409Mbps"?
  • Are there still minor firmware issues when you say "crash/hang in the first place"?
  • My ultimate goal to capture high resolution frames from multiple cameras whenever needed instead of streaming all time. Let's say I have 3 POE cameras and I would like to get 12MP color image and 1MP stereo image from each camera. The calculation for each camera will be 12M (pixels) * 3 (3-color channel, RGB) * 8 (bits) * 1 (number of frame) = 288 Mbps for color and 1M (pixels) * 1 (1-color channel) * 8 (bits) * 1 (number of frame) = 8Mbps for stereo image. The total is 296 Mbps and 296*3 = 888 Mbps for 3 cameras. Is this calculation correct?

Thank you!

miknai avatar Sep 12 '22 22:09 miknai

Hi @alex-luxonis,

have you solved this issue? I'm getting the same error.

Setup: 1x OAK-1-POE, 1Gbps network switch, CAT5e ethernet cable (should support up to 1Gbps for lengths up to 50m).

Monitor thread (device: 1844301001605D1200 [192.168.111.21]) - ping was missed, closing the device connection
F: [global] [    784991] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 56
F: [global] [    784991] [Scheduler00Thr] tcpipPlatformWrite:300        Cannot find file descriptor by key: 56
F: [global] [    784991] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 56
[1844301001605D1200] [192.168.111.21] [1704452793.244] [host] [warning] Device crashed, but no crash dump could be extracted.
terminate called without an active exception

blukaz avatar Jan 05 '24 11:01 blukaz

Hi @blukaz sorry for the delay. Do you have a means to reproduce the above. CC: @jakaskerl

themarpe avatar Jan 23 '24 19:01 themarpe