[BUG] OAK-D-POE intermittent failure - INTERNAL_ERROR_CORE
Problem Description OAK-D-POE cameras intermittently disappear from the network and become unreachable via ping while running. The issue occurs unpredictably:
- Sometimes happens after about 5 minutes of operation
- Sometimes doesn't occur at all during a session
- Requires a power cycle to bring the cameras back online
System Details
- Camera Model: OAK-D-POE
- Main Application: Written in C++
- Network Configuration: 2x OAK-D-POEs sharing a subnet with a RPLIDAR-2E and Intel NUC
- Power Supply: POE
- System info attached here log_system_information.json
Observed Behavior
- The pipeline starts normally from a cold boot.
- At an unpredictable point, the cameras become unreachable on the network.
- Ping attempts to the camera IP addresses fail.
- Power cycling the cameras brings them back online.
Crash Dumps Two crash dumps have been collected, showing the following errors:
INTERNAL_ERROR_CORE
RTEMS_FATAL_SOURCE_EXCEPTION
Crash dump files: crashDump_1_1844301031A3DC0E00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json crashDump_0_18443010C15E9F0F00_c7127782f2da45aac89d5b5b816d04cc45ae40be.json
Pipeline The main app is C++ so I can't get a pipeline grab, but here's the basic setup:
// Create Nodes
auto yolospatialdetectionnetwork = pipeline.create<dai::node::YoloSpatialDetectionNetwork>();
auto camrgb = pipeline.create<dai::node::ColorCamera>();
auto monoleft = pipeline.create<dai::node::MonoCamera>();
auto monoright = pipeline.create<dai::node::MonoCamera>();
auto stereo = pipeline.create<dai::node::StereoDepth>();
auto objecttracker = pipeline.create<dai::node::ObjectTracker>();
auto imu = pipeline.create<dai::node::IMU>();
auto videoenc = pipeline.create<dai::node::VideoEncoder>();
auto videoenc_webui = pipeline.create<dai::node::VideoEncoder>();
auto videoenc_left = pipeline.create<dai::node::VideoEncoder>();
auto videoenc_right = pipeline.create<dai::node::VideoEncoder>();
auto manip = pipeline.create<dai::node::ImageManip>();
auto xoutdepth = pipeline.create<dai::node::XLinkOut>();
auto xouttracks = pipeline.create<dai::node::XLinkOut>();
auto xoutdetections = pipeline.create<dai::node::XLinkOut>();
auto xoutIMU = pipeline.create<dai::node::XLinkOut>();
auto xoutvidenc = pipeline.create<dai::node::XLinkOut>();
auto xoutmonoenc_left = pipeline.create<dai::node::XLinkOut>();
auto xoutmonoenc_right = pipeline.create<dai::node::XLinkOut>();
auto xoutvideostream = pipeline.create<dai::node::XLinkOut>();
/// Set stream names for outputs
xouttracks->setStreamName("tracklets");
xoutdepth->setStreamName("depth");
xoutdetections->setStreamName("detections");
xoutvidenc->setStreamName("vid_enc");
xoutmonoenc_left->setStreamName("mono_enc_left");
xoutmonoenc_right->setStreamName("mono_enc_right");
xoutvideostream->setStreamName("videostream");
xoutIMU->setStreamName("imu");
// Set properties for nodes
camrgb->setPreviewSize(416, 416);
camrgb->setInterleaved(false);
camrgb->setColorOrder(dai::ColorCameraProperties::ColorOrder::RGB);
camrgb->setPreviewKeepAspectRatio(false);
// Set RGB resolution
if (camera_.isIMX378())
{
camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_1080_P);
camrgb->setIspScale(2, 3); // Set the rgb resolution to be 2/3 of the resolution for better alignment
}
else
{
camrgb->setResolution(dai::ColorCameraProperties::SensorResolution::THE_720_P);
}
monoleft->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
monoleft->setBoardSocket(dai::CameraBoardSocket::CAM_B);
monoright->setResolution(dai::MonoCameraProperties::SensorResolution::THE_400_P);
monoright->setBoardSocket(dai::CameraBoardSocket::CAM_C);
camrgb->setFps(fps);
monoleft->setFps(fps);
monoright->setFps(fps);
videoenc->setQuality(93);
videoenc_left->setQuality(93);
videoenc_right->setQuality(93);
videoenc->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
videoenc_left->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
videoenc_right->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::MJPEG);
videoenc_webui->setDefaultProfilePreset(fps, dai::VideoEncoderProperties::Profile::H264_BASELINE);
videoenc_webui->setQuality(50);
videoenc_webui->setFrameRate(fps);
videoenc_webui->setRateControlMode(dai::VideoEncoderProperties::RateControlMode::CBR);
auto videoenc_webui_bitrate = 500000;
auto videoenc_webui_width = 1280;
auto videoenc_webui_height = 720;
videoenc_webui->setBitrate(videoenc_webui_bitrate);
// imu settings
imu->enableIMUSensor({dai::IMUSensor::ACCELEROMETER_RAW, dai::IMUSensor::GYROSCOPE_RAW}, 200);
imu->setBatchReportThreshold(1);
imu->setMaxBatchReports(10);
// setting node configs
stereo->setDefaultProfilePreset(dai::node::StereoDepth::PresetMode::HIGH_ACCURACY);
stereo->setSubpixel(true);
stereo->setLeftRightCheck(true);
stereo->left.setQueueSize(1);
stereo->right.setQueueSize(1);
stereo->left.setBlocking(false);
stereo->right.setBlocking(false);
stereo->setDepthAlign(dai::CameraBoardSocket::CAM_A);
stereo->setOutputSize(monoleft->getResolutionWidth(), monoleft->getResolutionHeight());
stereo->useHomographyRectification(false);
stereo->setConfidenceThreshold(confidence_threshold);
auto config = stereo->initialConfig.get();
config.postProcessing.median = dai::MedianFilter::KERNEL_5x5;
config.postProcessing.temporalFilter.enable = true;
config.postProcessing.spatialFilter.enable = true;
config.postProcessing.spatialFilter.holeFillingRadius = 2;
config.postProcessing.spatialFilter.numIterations = 1;
config.postProcessing.thresholdFilter.minRange = 300;
config.postProcessing.thresholdFilter.maxRange = 10000;
config.postProcessing.decimationFilter.decimationFactor = 3;
config.postProcessing.decimationFilter.decimationMode = dai::RawStereoDepthConfig::PostProcessing::DecimationFilter::DecimationMode::NON_ZERO_MEDIAN;
// Set spatial mobile net settings
yolospatialdetectionnetwork->setBlobPath(nn_path);
// Pub names of classes
auto nn_classes = getNNClasses(nn_config_path);
std::unordered_map<std::string, float> detection_confidences;
// grab default confidence vals
try {
detection_confidences = getConfigValue<std::unordered_map<std::string, float>>(config_, {"nn", "default_confidence"});
} catch (const std::exception& e) {
ROS_ERROR_STREAM("Error parsing detection confidence: " << e.what());
}
grover_msgs::StringArray nn_classes_msg;
for (const auto& nn_class : nn_classes) {
nn_classes_msg.data.push_back(nn_class);
auto it = detection_confidences.find(nn_class);
if (it != detection_confidences.end()) {
m_detection_class_conf.push_back(std::make_pair(nn_class, it->second));
} else {
m_detection_class_conf.push_back(std::make_pair(nn_class, confidence_threshold));
}
}
m_nn_classes_pub = nh_.advertise<grover_msgs::StringArray>(cam_name_ + "/nn_classes", 1, true);
m_nn_classes_pub.publish(nn_classes_msg);
fillNNSettings<dai::node::YoloSpatialDetectionNetwork>(nn_config_path, yolospatialdetectionnetwork);
yolospatialdetectionnetwork->input.setBlocking(true);
yolospatialdetectionnetwork->setBoundingBoxScaleFactor(0.5);
yolospatialdetectionnetwork->setDepthLowerThreshold(150);
yolospatialdetectionnetwork->setDepthUpperThreshold(15000);
yolospatialdetectionnetwork->setIouThreshold(0.5f);
// possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
objecttracker->setTrackerType(dai::TrackerType::ZERO_TERM_IMAGELESS);
// take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
objecttracker->setTrackerIdAssignmentPolicy(dai::TrackerIdAssignmentPolicy::SMALLEST_ID);
manip->setMaxOutputFrameSize(1382400);
manip->initialConfig.setResize(1280, 720);
manip->initialConfig.setFrameType(dai::ImgFrame::Type::NV12);
monoleft->out.link(stereo->left);
monoright->out.link(stereo->right);
camrgb->video.link(manip->inputImage);
manip->out.link(videoenc_webui->input);
videoenc_webui->bitstream.link(xoutvideostream->input);
camrgb->video.link(videoenc->input);
monoright->out.link(videoenc_right->input);
monoleft->out.link(videoenc_left->input);
videoenc->bitstream.link(xoutvidenc->input);
videoenc_right->bitstream.link(xoutmonoenc_right->input);
videoenc_left->bitstream.link(xoutmonoenc_left->input);
stereo->depth.link(xoutdepth->input);
imu->out.link(xoutIMU->input);
camrgb->preview.link(yolospatialdetectionnetwork->input);
stereo->depth.link(yolospatialdetectionnetwork->inputDepth);
yolospatialdetectionnetwork->passthrough.link(objecttracker->inputTrackerFrame);
yolospatialdetectionnetwork->passthrough.link(objecttracker->inputDetectionFrame);
yolospatialdetectionnetwork->out.link(objecttracker->inputDetections);
yolospatialdetectionnetwork->out.link(xoutdetections->input);
objecttracker->out.link(xouttracks->input);
Any insights would be greatly appreciated, thanks
Thanks for the bug report @laurence-diack-pk !
Just to clarify, the disconnects happen whilst you're running the app right?
@SzabolcsGergely could you take a look at a crashdumps when you have a moment?
@SzabolcsGergely could you take a look at a crashdumps when you have a moment?
Crash occurred during a XLink read, in XLinkPlatformRead, reason unknown.
Thanks for the bug report @laurence-diack-pk !
Just to clarify, the disconnects happen whilst you're running the app right?
Yeah so it seems it can happen on pipeline load or also mid-run.
It doesn't seem to be a very predictable failure and I'm having a hard time reproducing it consistently - for example I am looking at an instance right now where one of two cameras has disappeared, but I had to restart the host several times to get it into this state.
Also it may well be that the crashdumps are not a 1:1 correlation with this failure, as I have observed cases where it does this and no crashdump is retrieved.
Sorry for the vagueness, it's just sorta a black box from my end - if I can't communicate with the camera over network, it's hard to tell exactly what's going on.
I was wondering if there's any additional logging I can pull of the device itself, or perhaps some way in which I could use the M8 connector to debug over uart or usb so I can get some insight into the state of the camera when it disappears like that.
I trimmed down the pipeline slightly by conditionally removing nodes that weren't necessary (imageManip and the mono encoders/outs) and that seemed to help stability greatly, though I have still had a few intermittent failures.
See: crashDump_0_18443010C15E9F0F00_9ed7c9ae4c232ff93a3500a585a6b1c00650e22c.json