depthai
depthai copied to clipboard
[BUG] {Luxonis spatial coordinates are very inaccurate}
Check if issue already exists Some discussion was found however it seems like people claim the depth has only about 3% variance: https://discuss.luxonis.com/d/343-accuracy-of-spatial-detection-and-possible-improvements. However this does not match my results.
Describe the bug The spatial coordinate output of the Mobilenet SSD spatial detection network is very inaccurate. Especially at very close distance (z < 3m) and very far distance (z > 7m) it gets very inaccurate. Moreover, when z > 1m the depth is always greater than the ground truth depth.
What I did to test this was walk backward along z-axis from the luxonis camera within a range of 0m to 10m, and after every meter moved backwards I move +0.5m and -0.5m sideways (along the x-axis). See screenshot below for recorded output.
Is this just an inherent limitation of the inbuilt spatial detection network? Would a host-side computation perform better?
To Reproduce Run Mobilenet spatial detection network. Here is the relevant part of the config code.
pipeline = dai.Pipeline()
camRgb = pipeline.create(dai.node.ColorCamera)
spatialDetectionNetwork = pipeline.createMobileNetSpatialDetectionNetwork()
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
xoutRgb = pipeline.create(dai.node.XLinkOut)
xoutNN = pipeline.create(dai.node.XLinkOut)
xoutBoundingBoxDepthMapping = pipeline.create(dai.node.XLinkOut)
xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutRgb.setStreamName("rgb")
xoutNN.setStreamName("detections")
xoutBoundingBoxDepthMapping.setStreamName("boundingBoxDepthMapping")
xoutDepth.setStreamName("depth")
camRgb.setPreviewSize(300, 300)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
stereo.setOutputSize(monoLeft.getResolutionWidth(), monoLeft.getResolutionHeight())
spatialDetectionNetwork.setBlobPath(
blobconverter.from_zoo(name="mobilenet-ssd", shaves=6)
)
spatialDetectionNetwork.setConfidenceThreshold(0.5)
spatialDetectionNetwork.input.setBlocking(False)
spatialDetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialDetectionNetwork.setDepthLowerThreshold(100)
spatialDetectionNetwork.setDepthUpperThreshold(30000)
Expected behavior I expect the coordinates to match roughly with ground truth.
Screenshots Below, the z-axis is between 0 meter to 20 meters and the x-axis -5 meters to +5 meters. If the output from the network was correct, the max depth (max z-axis) should be 10 meters, and after each 1 meter there should be a line between +0.5 meters and -0.5 meters at the current depth. Disregard the title.

@Cupcee for long-range you should set monoLeft and monoRight to THE_720_P (not available on OAK-D Lite!) and stereo.setSubpixel(True)
Also you should set: camRgb.initialControl.setManualFocus(135);
You can read more about what subpixel is here
We will add a short how-to about how Stereo should be used for different use-cases.
@Cupcee for long-range you should set monoLeft and monoRight to
THE_720_P(not available on OAK-D Lite!) andstereo.setSubpixel(True)Also you should set:camRgb.initialControl.setManualFocus(135);You can read more about what subpixel is here We will add a short how-to about how Stereo should be used for different use-cases.
Alright, thanks for the response. Will try these and report back!
@szabi-luxonis Added these settings, but it seems like nothing changed (I also tried reducing the bounding box scale factor to 0.2). FYI, I measured the first 5 ground truth meters from the camera, here is the results (approx, converted from millimeters to meters):
1m = 1.5m luxonis 2m = 2.8m luxonis 3m = 4.3m luxonis 4 = 6.1m luxonis 5 = 8.4m luxonis
What I'm doing at the moment, is using a scale factor of 0.67 to convert the Luxonis distance to the real world distance, but obviously this isn't optimal:
>>> import numpy as np
>>> a = np.array([1/1.5, 2/2.8, 3/4.3, 4/6.1, 5/8.4])
>>> a.mean()
0.665920519942632
>>> a.var()
0.001689550744590681
>>>
Here is the new config:
syncNN = True
# Create pipeline
pipeline = dai.Pipeline()
# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialDetectionNetwork = pipeline.createMobileNetSpatialDetectionNetwork()
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
xoutRgb = pipeline.create(dai.node.XLinkOut)
xoutNN = pipeline.create(dai.node.XLinkOut)
xoutBoundingBoxDepthMapping = pipeline.create(dai.node.XLinkOut)
xoutDepth = pipeline.create(dai.node.XLinkOut)
xoutRgb.setStreamName("rgb")
xoutNN.setStreamName("detections")
xoutBoundingBoxDepthMapping.setStreamName("boundingBoxDepthMapping")
xoutDepth.setStreamName("depth")
# Properties
# camRgb.setPreviewKeepAspectRatio(False)
camRgb.setPreviewSize(300, 300)
# camRgb.setPreviewSize(1300, 800)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)
camRgb.initialControl.setManualFocus(135)
monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_720_P)
monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_720_P)
monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
# Setting node configs
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
# Align depth map to the perspective of RGB camera, on which inference is done
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
stereo.setOutputSize(monoLeft.getResolutionWidth(), monoLeft.getResolutionHeight())
stereo.setSubpixel(True)
# detection_nn.setBlobPath(blobconverter.from_zoo(name="mobilenet-ssd", shaves=5))
spatialDetectionNetwork.setBlobPath(
blobconverter.from_zoo(name="mobilenet-ssd", shaves=6)
)
spatialDetectionNetwork.setConfidenceThreshold(0.5)
spatialDetectionNetwork.input.setBlocking(False)
spatialDetectionNetwork.setBoundingBoxScaleFactor(0.2)
spatialDetectionNetwork.setDepthLowerThreshold(100)
spatialDetectionNetwork.setDepthUpperThreshold(10000)
# Use ImageManip to resize to 300x300 with letterboxing
# manip = pipeline.create(dai.node.ImageManip)
# manip.setMaxOutputFrameSize(270000) # 300x300x3
# manip.initialConfig.setResizeThumbnail(300, 300)
# camRgb.preview.link(manip.inputImage)
# Linking
monoLeft.out.link(stereo.left)
monoRight.out.link(stereo.right)
# manip.out.link(spatialDetectionNetwork.input)
camRgb.preview.link(spatialDetectionNetwork.input)
if syncNN:
spatialDetectionNetwork.passthrough.link(xoutRgb.input)
else:
camRgb.preview.link(xoutRgb.input)
spatialDetectionNetwork.out.link(xoutNN.input)
spatialDetectionNetwork.boundingBoxMapping.link(xoutBoundingBoxDepthMapping.input)
stereo.depth.link(spatialDetectionNetwork.inputDepth)
spatialDetectionNetwork.passthroughDepth.link(xoutDepth.input)```
Also, I'm wondering if there is any way to make the spatial detector network to work with a wider resolution than 300x300 properly? I tried it with black bars and stretching but this screws up the spatial calculations.
Hello @Cupcee , Could you also share screenshot of the color+depth/disparity+rectified mono frames of the test setup? As from our test setup, we have measured better depth accuracy (see docs here). Thanks, Erik