depthai-core
depthai-core copied to clipboard
[gen2-multiple-devices] Ping was missed, closing the device connection
Luxonis team,
I currently have three POE cameras (2 - OAK-D-POE and 1 - OAK-D-POE-PRO). At an office, I was not able to run the main.py of gen2-multiple-devices. Whenever I run the script, one of three camera gives me "ping was missed, closing the device connection" error. The camera giving me this error changes every run, so it's not happening from one camera.
This is the code I use (main.py): https://github.com/luxonis/depthai-experiments/blob/master/gen2-multiple-devices/main.py
I had a chance to talk with Erik through Discord channel. With his advice, I updated the boot-loader for all three camera, updated depthai version, and even factory reset-ed the three camera. The issue persists. I swapped the ethernet switch with another one. Also, connected one camera through an ethernet injector. The issue persists. I brought the three camera (plus ethernet switch) to home and ran the script. It displayed the video stream for 10 seconds and then gave me the disconnection.
I think I have tried everything I can do at this point, but still not sure what causes this issue. I purchased another 8 OAK-D-POE-PRO units recently and I really wish to know how to deal with this error. Could you please help me out?
Here's the terminal output.
Found 3 devices
=== Connected to 18443010D1F77B0E00
>>> MXID: 18443010D1F77B0E00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D15EE00F00
>>> MXID: 18443010D15EE00F00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
=== Connected to 184430106192F30F00
>>> MXID: 184430106192F30F00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
[2022-08-16 13:38:26.212] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
File "test-multiple-device.py", line 73, in <module>
in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'
This time the camera, 18443010D1F77B0E00 (MxId) was disconnected, but it happens to all three cameras.
Here's another terminal output. The video stream windows were up for 10 seconds and then the program got terminated due to the connection drop.
Found 3 devices
=== Connected to 184430106192F30F00
>>> MXID: 184430106192F30F00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D15EE00F00
>>> MXID: 18443010D15EE00F00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
=== Connected to 18443010D1F77B0E00
>>> MXID: 18443010D1F77B0E00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
[2022-08-16 16:16:32.443] [warning] Monitor thread (device: 184430106192F30F00 [10.0.0.107]) - ping was missed, closing the device connection
Traceback (most recent call last):
File "test-multiple-device.py", line 73, in <module>
in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'
Hi @miknai ,
What version of depthai (latest: 2.17.3) are you using, and what version of the bootlaoder (latest: 0.20) are your OAK cameras? If they aren't the latest, I would suggest updating them first (python3 -mpip install depthai -U
for depthai, device manager for bootlaoder).
Thanks, Erik
All of them are the latest ones. I have tried the factory reset via network (through device_manager.py UI) as well.
I tried running with only two camera this time (1-OAK-D-POE and 1-OAK-D-POE-PRO). I physically disconnected the other camera. The program displayed two video stream windows as expected, worked fine for about 4 minutes, and then got the "ping was missed..." error.
Found 2 devices
=== Connected to 18443010D1F77B0E00
>>> MXID: 18443010D1F77B0E00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
=== Connected to 184430106192F30F00
>>> MXID: 184430106192F30F00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
[2022-08-16 16:36:09.964] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
File "test-multiple-device.py", line 73, in <module>
in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'
Tried running with only one camera this time (1-OAK-D-POE-PRO). It lasted 15 mins and got crushed due to the same error.
Found 1 devices
=== Connected to 18443010D1F77B0E00
>>> MXID: 18443010D1F77B0E00
>>> Cameras: RGB LEFT RIGHT
>>> USB speed: UNKNOWN
>>> Loading pipeline for: OAK-D-POE
[2022-08-16 17:18:01.339] [warning] Monitor thread (device: 18443010D1F77B0E00 [10.0.0.214]) - ping was missed, closing the device connection
Traceback (most recent call last):
File "test-multiple-device.py", line 73, in <module>
in_rgb = q_rgb.tryGet()
RuntimeError: Communication exception - possible device error/misconfiguration. Original message 'Couldn't read data from stream: 'rgb' (X_LINK_ERROR)'
In summary, 3 cameras => lasted 10 seconds 2 cameras => lasted 4 minutes 1 camera => lasted 15 mins
I wanted to use one camera as surveillance monitoring but it seems I cannot do that due to the connection drop... Is the camera designed to be running 24/7 as long as the power is supplied?
Any updates on this? Is this issue not reproducible on your side?
@miknai Sorry about the delay on our side, let me double-check with the team.
Hi guys, I'm experiencing the same issue.
This is depthai 2.17.3.0.dev+dev, running on a raspberry pi 4, on a docker container with base image luxonis/depthai-library:v2.17.3.0-armv7
I run 2 oak-1-poe simultaneously from the same raspi, and the problem happens on both cameras.
They had firmware 0.018, but the problem persists after updating both of them to 0.0.20.
This is the error I see: [2022-08-27 12:11:06.162] [#033[33m#033[1mwarning#033[m] Monitor thread (device: 184430101173DE0F00 [192.168.80.254]) - ping was missed, closing the device connection
Here are some logs with debug enabled if it helps:
`[2022-08-27 15:34:45.036] [debug] Python bindings - version: 2.17.3.0.dev+dev from build: 2022-08-07 12:42:18 +0000
[2022-08-27 15:34:45.036] [debug] Library information - version: 2.17.3, commit: from , build: 2022-08-07 12:42:08 +0000
[2022-08-27 15:34:45.041] [debug] Initialize - finished
DEPTHAI VERSION: 2.17.3.0.dev+dev
[1243.33760] finished setup 02-front oak
DEPTHAI VERSION: 2.17.3.0.dev+dev
[1243.33940] finished setup 01-back oak
cams setup, starting them
[2022-08-27 15:34:45.063] [debug] Device - OpenVINO version: 2022.1
[2022-08-27 15:34:45.063] [debug] Device - OpenVINO version: 2022.1
[2022-08-27 15:34:45.063] [debug] Device - BoardConfig: {"emmc":null,"gpio":[],"logPath":null,"logSizeMax":null,"logVerbosity":null,"network":{"mtu":0,"xlinkTcpNoDelay":true},"pcieInternalClock":null,"sysctl":[],"uart":[],"usb":{"flashBootedPid":63037,"flashBootedVid":999
,"maxSpeed":4,"pid":63035,"vid":999},"usb3PhyInternalClock":null,"watchdogInitialDelayMs":null,"watchdogTimeoutMs":null}
libnop:
0000: b9 0d b9 05 81 e7 03 81 3b f6 81 e7 03 81 3d f6 04 b9 02 00 01 ba 00 be be bb 00 bb 00 be be be
0020: be be be
[2022-08-27 15:34:45.064] [debug] Device - BoardConfig: {"emmc":null,"gpio":[],"logPath":null,"logSizeMax":null,"logVerbosity":null,"network":{"mtu":0,"xlinkTcpNoDelay":true},"pcieInternalClock":null,"sysctl":[],"uart":[],"usb":{"flashBootedPid":63037,"flashBootedVid":999
,"maxSpeed":4,"pid":63035,"vid":999},"usb3PhyInternalClock":null,"watchdogInitialDelayMs":null,"watchdogTimeoutMs":null}
libnop:
0000: b9 0d b9 05 81 e7 03 81 3b f6 81 e7 03 81 3d f6 04 b9 02 00 01 ba 00 be be bb 00 bb 00 be be be
0020: be be be
[2022-08-27 15:34:45.381] [debug] Resources - Archive 'depthai-bootloader-fwp-0.0.20.tar.xz' open: 7ms, archive read: 334ms
[2022-08-27 15:34:46.423] [debug] Resources - Archive 'depthai-device-fwp-602822fe9eaca68a72c666497dc4979b29291b3e.tar.xz' open: 7ms, archive read: 1377ms
[2022-08-27 15:34:46.578] [debug] Searching for booted device: DeviceInfo(name=192.168.80.254, mxid=184430101173DE0F00, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
[2022-08-27 15:34:46.579] [debug] Searching for booted device: DeviceInfo(name=192.168.80.253, mxid=18443010D162DE0F00, X_LINK_BOOTLOADER, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
[2022-08-27 15:34:46.821] [debug] Connected bootloader version 0.0.20
[2022-08-27 15:34:46.823] [debug] Connected bootloader version 0.0.20
[2022-08-27 15:35:05.678] [debug] Booting FW with Bootloader. Version 0.0.20, Time taken: 18855ms
[2022-08-27 15:35:05.678] [debug] DeviceBootloader about to be closed...
[2022-08-27 15:35:05.679] [debug] XLinkResetRemote of linkId: (0)
[2022-08-27 15:35:06.908] [debug] DeviceBootloader closed, 1229
[2022-08-27 15:35:07.013] [debug] Searching for booted device: DeviceInfo(name=192.168.80.254, mxid=184430101173DE0F00, X_LINK_BOOTED, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
[2022-08-27 15:35:07.495] [debug] Booting FW with Bootloader. Version 0.0.20, Time taken: 20673ms
[2022-08-27 15:35:07.495] [debug] DeviceBootloader about to be closed...
[2022-08-27 15:35:07.495] [debug] XLinkResetRemote of linkId: (1)
[2022-08-27 15:35:08.723] [debug] DeviceBootloader closed, 1227
[2022-08-27 15:35:08.832] [debug] Searching for booted device: DeviceInfo(name=192.168.80.253, mxid=18443010D162DE0F00, X_LINK_BOOTED, X_LINK_TCP_IP, X_LINK_MYRIAD_X, X_LINK_SUCCESS), name used as hint only
[184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Memory Usage - DDR: 0.12 / 340.93 MiB, CMX: 2.05 / 2.50 MiB, LeonOS Heap: 24.96 / 77.58 MiB, LeonRT Heap: 2.88 / 41.37 MiB
[184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Temperatures - Average: 28.86 °C, CSS: 30.32 °C, MSS 28.13 °C, UPA: 28.37 °C, DSS: 28.62 °C
[184430101173DE0F00] [192.168.80.254] [657.484] [system] [info] Cpu Usage - LeonOS 6.86%, LeonRT: 0.56%
[2022-08-27 15:35:10.578] [debug] Schema dump: {"connections":[{"node1Id":4,"node1Output":"bitstream","node1OutputGroup":"","node2Id":5,"node2Input":"in","node2InputGroup":""},{"node1Id":7,"node1Output":"out","node1OutputGroup":"","node2Id":0,"node2Input":"inputControl","
node2InputGroup":""},{"node1Id":0,"node1Output":"still","node1OutputGroup":"","node2Id":4,"node2Input":"in","node2InputGroup":""},{"node1Id":1,"node1Output":"bitstream","node1OutputGroup":"","node2Id":2,"node2Input":"in","node2InputGroup":""},{"node1Id":3,"node1Output":"o
ut","node1OutputGroup":"","node2Id":1,"node2Input":"in","node2InputGroup":""},{"node1Id":0,"node1Output":"isp","node1OutputGroup":"","node2Id":3,"node2Input":"inputImage","node2InputGroup":""}],"globalProperties":{"calibData":null,"cameraTuningBlobSize":null,"cameraTuning
BlobUri":"","leonCssFrequencyHz":700000000.0,"leonMssFrequencyHz":700000000.0,"pipelineName":null,"pipelineVersion":null,"xlinkChunkSize":-1},"nodes":[[0,{"id":0,"ioInfo":[[["","preview"],{"blocking":false,"group":"","name":"preview","queueSize":8,"type":0,"waitForMessage
":false}],[["","video"],{"blocking":false,"group":"","name":"video","queueSize":8,"type":0,"waitForMessage":false}],[["","still"],{"blocking":false,"group":"","name":"still","queueSize":8,"type":0,"waitForMessage":false}],[["","raw"],{"blocking":false,"group":"","name":"r
aw","queueSize":8,"type":0,"waitForMessage":false}],[["","inputConfig"],{"blocking":false,"group":"","name":"inputConfig","queueSize":8,"type":3,"waitForMessage":false}],[["","isp"],{"blocking":false,"group":"","name":"isp","queueSize":8,"type":0,"waitForMessage":false}],
[["","inputControl"],{"blocking":true,"group":"","name":"inputControl","queueSize":8,"type":3,"waitForMessage":false}]],"name":"ColorCamera","properties":[185,23,185,20,0,3,0,185,3,0,0,0,185,5,0,0,0,0,0,185,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,0,0,0,129,104,1,1
29,224,1,255,255,133,0,15,133,112,8,1,136,0,0,240,65,136,0,0,128,191,136,0,0,128,191,0,185,4,0,0,0,0,3,3,4,4,4]}],[1,{"id":1,"ioInfo":[[["","bitstream"],{"blocking":false,"group":"","name":"bitstream","queueSize":8,"type":0,"waitForMessage":false}],[["","in"],{"blocking":
true,"group":"","name":"in","queueSize":4,"type":3,"waitForMessage":true}]],"name":"VideoEncoder","properties":[185,11,134,0,36,244,0,30,0,0,0,0,1,80,0,0,136,0,0,240,65]}],[2,{"id":2,"ioInfo":[[["","in"],{"blocking":true,"group":"","name":"in","queueSize":8,"type":3,"wait
ForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,5,118,105,100,101,111,0]}],[3,{"id":3,"ioInfo":[[["","out"],{"blocking":false,"group":"","name":"out","queueSize":8,"type":0,"waitForMessage":false}],[["","inputConfig"],{"blocking":true,"group"
:"","name":"inputConfig","queueSize":8,"type":3,"waitForMessage":false}],[["","inputImage"],{"blocking":true,"group":"","name":"inputImage","queueSize":8,"type":3,"waitForMessage":true}]],"name":"ImageManip","properties":[185,6,185,8,185,7,185,4,136,0,0,0,0,136,0,0,0,0,13
6,0,0,0,0,136,0,0,0,0,185,3,185,2,136,0,0,0,0,136,0,0,0,0,185,2,136,0,0,0,0,136,0,0,0,0,136,0,0,0,0,0,136,0,0,128,63,136,0,0,128,63,0,1,185,15,133,128,7,133,56,4,0,0,0,0,186,0,1,0,186,0,0,0,136,0,0,0,0,0,1,185,3,22,0,0,0,1,1,0,0,134,0,118,47,0,4,0,0,189,0]}],[4,{"id":4,"i
oInfo":[[["","bitstream"],{"blocking":false,"group":"","name":"bitstream","queueSize":8,"type":0,"waitForMessage":false}],[["","in"],{"blocking":true,"group":"","name":"in","queueSize":1,"type":3,"waitForMessage":true}]],"name":"VideoEncoder","properties":[185,11,0,30,0,0
,1,0,4,65,0,0,136,0,0,128,63]}],[5,{"id":5,"ioInfo":[[["","in"],{"blocking":true,"group":"","name":"in","queueSize":8,"type":3,"waitForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,3,106,112,103,0]}],[6,{"id":6,"ioInfo":[[["","in"],{"blocking"
:true,"group":"","name":"in","queueSize":8,"type":3,"waitForMessage":true}]],"name":"XLinkOut","properties":[185,3,136,0,0,128,191,189,7,112,114,101,118,105,101,119,0]}],[7,{"id":7,"ioInfo":[[["","out"],{"blocking":false,"group":"","name":"out","queueSize":8,"type":0,"wai
tForMessage":false}]],"name":"XLinkIn","properties":[185,3,189,10,99,97,109,67,111,110,116,114,111,108,130,32,161,7,0,8]}]]}
[2022-08-27 15:35:10.579] [debug] Asset map dump: {"map":{}}
[184430101173DE0F00] [192.168.80.254] [657.594] [system] [info] ImageManip internal buffer size '188608'B, shave buffer size '35840'B
[184430101173DE0F00] [192.168.80.254] [657.594] [system] [info] SIPP (Signal Image Processing Pipeline) internal buffer size '16384'B
[184430101173DE0F00] [192.168.80.254] [657.629] [system] [info] ColorCamera allocated resources: no shaves; cmx slices: [10-15]
ImageManip allocated resources: shaves: [15-15] no cmx slices.
[1269.03297] CONNECTED TO CAMERA: 01-back 184430101173DE0F00
[184430101173DE0F00] [192.168.80.254] [658.486] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB
[184430101173DE0F00] [192.168.80.254] [658.486] [system] [info] Temperatures - Average: 31.65 °C, CSS: 32.97 °C, MSS 31.53 °C, UPA: 30.56 °C, DSS: 31.53 °C
[184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB
[184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Temperatures - Average: 29.53 °C, CSS: 31.04 °C, MSS 29.35 °C, UPA: 28.86 °C, DSS: 28.86 °C
[184430101173DE0F00] [192.168.80.254] [931.822] [system] [info] Cpu Usage - LeonOS 14.82%, LeonRT: 1.38%
[2022-08-27 15:39:45.531] [warning] Monitor thread (device: 18443010D162DE0F00 [192.168.80.253]) - ping was missed, closing the device connection
connected cameras: [<CameraBoardSocket.RGB: 0>]
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None
[1544.04101] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e4e2ff3825b911ed9489dca632575d7a'}
[2022-08-27 15:39:45.761] [debug] DataOutputQueue (preview) closed
[2022-08-27 15:39:45.761] [debug] XLinkResetRemote of linkId: (2)
FFMPEG PIPE ERROR write to closed file
ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n'
[2022-08-27 15:39:45.762] [debug] DataOutputQueue (video) closed
VIDEOFRAME DATA: 0:25:32.589992 7866 52825
[2022-08-27 15:39:45.763] [debug] DataOutputQueue (jpg) closed
[2022-08-27 15:39:45.765] [debug] Log thread exception caught: Couldn't read data from stream: '__log' (X_LINK_ERROR)
[2022-08-27 15:39:45.765] [debug] Timesync thread exception caught: Couldn't read data from stream: '__timesync' (X_LINK_ERROR)
[2022-08-27 15:39:45.767] [debug] Device about to be closed...
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None
[1544.06524] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e4e6b25425b911ed83e4dca632575d7a'}
FFMPEG PIPE ERROR write to closed file
ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n'
VIDEOFRAME DATA: 0:25:25.552082 7720 56995
connected cameras: [<CameraBoardSocket.RGB: 0>]
[184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Memory Usage - DDR: 203.17 / 340.93 MiB, CMX: 2.29 / 2.50 MiB, LeonOS Heap: 40.24 / 77.58 MiB, LeonRT Heap: 5.00 / 41.37 MiB
[184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Temperatures - Average: 30.80 °C, CSS: 31.77 °C, MSS 31.04 °C, UPA: 29.83 °C, DSS: 30.56 °C
[184430101173DE0F00] [192.168.80.254] [932.823] [system] [info] Cpu Usage - LeonOS 15.87%, LeonRT: 1.35%
memory used: 213035008 remaining: 144453119 total: 357488127
is device closed: None
[1544.27472] topic: dc-a6-32-57-5d-7a msg: {'action': 'setLed', 'color': 'red', 'mode': 'blink', 'status': 'error', 'msgId': 'e506ab1825b911ed83e4dca632575d7a'}
FFMPEG PIPE ERROR write to closed file
ffmpeg stdout: b'' stderr: b'pipe:: Invalid data found when processing input\n'
VIDEOFRAME DATA: 0:25:25.751384 7726 64363
[2022-08-27 15:39:46.510] [debug] Device closed, 743
[2022-08-27 15:39:46.511] [debug] DataInputQueue (camControl) closed
Exception in thread 18443010D162DE0F00:
Traceback (most recent call last):
File "/home/pi/camera/utils.py", line 523, in startOak
self.ffmpeg.stdin.write(framedata)
ValueError: write to closed file
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner self.run() File "/usr/local/lib/python3.9/threading.py", line 910, in run self._target(*self._args, **self._kwargs) File "/home/pi/camera/utils.py", line 531, in startOak self.ffmpegErrorHandler(error, videoFrame) File "/home/pi/camera/utils.py", line 340, in ffmpegErrorHandler print("connected cameras:", self.device.getConnectedCameras()) ValueError: Device already closed or disconnected connected cameras: [<CameraBoardSocket.RGB: 0>] memory used: 213035008 remaining: 144453119 total: 357488127 is device closed: None`
Thanks for the reports on this issue - we'll be looking into this in more details and will post back as soon as we get more information on it.
Is there a good way to prove that this is not caused by insufficient power?
@miknai as the timings degrade with multiple cameras I highly suspect this not being a power supply issue (otherwise it should either be okay or start failing after more cameras, if POE switch wouldn't be able to supply enough power)
@miknai Could you run a test with an environment variable set: DEPTHAI_WATCHDOG=0
, to see if it improves the stability.
On Linux, to not have to export, can prepend it to the command being ran:
depthai-experiments/gen2-multiple-devices$
DEPTHAI_WATCHDOG=0 python3 main.py
A note for DEPTHAI_WATCHDOG=0
, it should work fine with the above script, but on the latest release (v2.17.3.1
) it may cause issues with pipelines that load assets (like NeuralNetwork blobs, StereoDepth undistortion mesh, etc). Recently we fixed this issue on the develop
branch.
Okay I will try that within a couple of hours and share the results here. => Sorry I haven't tried the watchdog trick yet. I will do that tomorrow. => With the disabled watchdog, I was able to run the multi-device video stream without the connection drop. For some cameras, the streaming became really laggy quite often like 2 frames per second or even less frames.
Hi! I've run into this same issue, but consistently only effecting one camera (using static IPs). I'm also able to run all my cameras (currently 3) with no issues if I run them in completely separate terminals. My system previously worked on older depthai/bootloader versions (2.15.4.0/0.0.15) and has been having this issue since I flashed the newest bootloader version (0.0.20) yesterday (persists even after factory reset).
The watchdog fix worked on my system, both using this script and a custom one.
On my end, even though I run three cameras on separate terminals, I still get the connection drop from the two OAK-D-POE units. I haven't tried the watchdog trick yet.
When I mention separate terminals I mean a process like:
- no Cameras
- plug in problem camera and run the multi-device main.py
- plug in the other two and then run the script again in a different terminal This has no issues and I've run it for more than 15min a couple times before manually stopping both windows.
If I have all three cameras viewable by Device.getAllAvailableDevices(), but only try to add a camera that presents the ping issue (I'm using device = dai.Device(pipeline, device_info) ), it will crash with the same error pretty quickly after Device() is called.
Currently I'm not doing anything with the cameras besides streaming RGB images on v2.17.3.1 and have not seen any issues with the watchdog env variable fix
With the disabled watchdog, I was able to run the multi-device video stream without the connection drop. Ran it with all three cameras together for about 30 mins. For some cameras, the streaming became really laggy quite often like 2 frames per second or even less frames. But, at least, I don't lose connection.
What does watchdog do in the system and why did disabling it make difference?
@miknai
Do you mind testing out latest develop
as well? It was likely an issue with how PoE devices had the WD timing configured.
We've tweaked it now and should work better without changing the watchdog manually (or disabling it with that said)
Also, if that doesn't work, do you mind comparing it with the env var set to DEPTHAI_WATCHDOG=60000
?
@themarpe What does WD stand for in WD timing?
Okay, I can do it by tomorrow.
@themarpe When you say the latest develop branch, it's the one in this repo, correct? As I mainly use Python, I don't use this repo much. The repo I initially opened this issue was depthai-experiement.
Oh wait, I believe the depthai-python repo is the wrapper of this depthai-core, so the changes in this repo make changes in depthai-python repo as well. Then, how the depthai-experiment repo gets affected by depthai-core? I don't see anything like "depthai-core @ commit id" in depthai-experiment repo.
How do I know that the gen2-multiple-devices runs with the code changes you made in depthai-core repo?
@miknai WD stands for watchdog.
Go under depthai-python
, checkout develop
branch and run python3 examples/install_dependencies.py
then run the gen2-multiple-devices
experiment (without installing requirements specified in that experiment)
@themarpe, thank you for providing me the instruction to follow.
I encountered this issue with the develop branch.
When I run with DEPTHAI_WATCHDOG=60000, the program opens up three windows (I have three cameras), but only 1 unit (oak-d-poe-pro) streams the video properly. Other two units (oak-d-poe) freezes as shown in the this video.
https://user-images.githubusercontent.com/58019428/189461804-1b2ca1d9-8f9f-447a-9034-b4f94a65635d.mp4
@miknai On the same branch/version, if you run with DEPTHAI_WATCHDOG=0
, are all cameras working well, no freezing?
@alex-luxonis If I run with DEPTHAI_WATCHDOG=0, all cameras work but 2 of them are extremely slow. In the below video, I am waving constantly in front of all three cameras, but only 1 unit (oak-d-poe) works properly.
https://user-images.githubusercontent.com/58019428/189462646-ae559591-202e-4abc-b04d-af3cbcbd5fd1.mp4
@miknai Thanks for testing, this makes sense. It looks like the bandwidth available in your setup is limited, for example due to WiFi, or 100Mbps Ethernet links, and severe congestion happens for some of the TCP streams.
In its current form, gen2-multiple-devices/main.py
requires around 409Mbps of bandwidth for 3x OAK-D-PoE devices. The RGB camera of each is configured to stream uncompressed images, consuming each:
600*300 (w*h) * 3 (Bpp,RGB) * 30 (fps) * 8 (bits) = 130 Mbps
for video payload, plus some extra overhead (TCP/IP, etc)
If the communication is completely stalled for several seconds, as it appears from the above video, the device watchdog would kick in and reset it. That's likely what happens with the value 60000ms, that gets capped to 5000ms due to some hardware constraints, and it's insufficient for the above scenario. (We should revisit that logic in firmware, and auto-feed the watchdog for values larger than 5000.)
On my side, testing with 3x OAK-D-PoE devices and the same app, they all stream smoothly and with no lag, over a 1Gbps Ethernet link from the PoE switch to the test PC. You can try a similar setup, or edit the script to reduce the preview size for a test. Ideally, VideoEncoder should be used to compress to JPEG/H.264/H.265 for such limited bandwidth cases.
But we should also fix the crash/hang in the first place, in case network congestion still happens due to various reasons.
@alex-luxonis Thank you so much for sharing the detailed info. It makes much more sense with the calculation. The wifi speed was 78 Mbps. Only one camera works well with this setup. Now, I am trying with ethernet cable which enables me to have 350 Mbps, and two cameras work well in this setup. Makes sense with the calculation that 2x OAK-D-POE requires 260 + extra.
Questions:
- How did you measure that "gen2-multiple-devices/main.py requires around 409Mbps"?
- Are there still minor firmware issues when you say "crash/hang in the first place"?
- My ultimate goal to capture high resolution frames from multiple cameras whenever needed instead of streaming all time. Let's say I have 3 POE cameras and I would like to get 12MP color image and 1MP stereo image from each camera. The calculation for each camera will be 12M (pixels) * 3 (3-color channel, RGB) * 8 (bits) * 1 (number of frame) = 288 Mbps for color and 1M (pixels) * 1 (1-color channel) * 8 (bits) * 1 (number of frame) = 8Mbps for stereo image. The total is 296 Mbps and 296*3 = 888 Mbps for 3 cameras. Is this calculation correct?
Thank you!
Hi @alex-luxonis,
have you solved this issue? I'm getting the same error.
Setup: 1x OAK-1-POE, 1Gbps network switch, CAT5e ethernet cable (should support up to 1Gbps for lengths up to 50m).
Monitor thread (device: 1844301001605D1200 [192.168.111.21]) - ping was missed, closing the device connection
F: [global] [ 784991] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 56
F: [global] [ 784991] [Scheduler00Thr] tcpipPlatformWrite:300 Cannot find file descriptor by key: 56
F: [global] [ 784991] [EventRead00Thr] tcpipPlatformRead:272 Cannot find file descriptor by key: 56
[1844301001605D1200] [192.168.111.21] [1704452793.244] [host] [warning] Device crashed, but no crash dump could be extracted.
terminate called without an active exception
Hi @blukaz sorry for the delay. Do you have a means to reproduce the above. CC: @jakaskerl