depthai-ros [BUG] Generic ROS2 driver output for spatial yolo is incorrect

Hello,

i try to use the yolotiny4 with spatial information via the camera.cpp ros node (via the camera.launch.py). The model runs and the inference results in proper classification, but the spatial information is way off. I get -3.0 to 3.0 meters in all axis (x,y,z) for the pose.position while identifying a human (myself) sitting directly in front of the camera.

Position Log while im sitting ~50cm in front of the camera

[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-0.0, y=0.0, z=0.0)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-0.0, y=0.0, z=0.0)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-4.74098539352417, y=3.4557785987854004, z=8.550938606262207)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-1.1852463483810425, y=0.8710846900939941, z=2.1377346515655518)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-3.7184200286865234, y=2.7328147888183594, z=6.706618309020996)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-0.9341843128204346, y=0.6809415817260742, z=1.684913992881775)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-3.1088430881500244, y=2.2848124504089355, z=5.607172966003418)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-3.0101494789123535, y=2.2122786045074463, z=5.429166793823242)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-1.1424061059951782, y=0.8395997285842896, z=2.060467004776001)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-4.122596263885498, y=3.054694890975952, z=7.435598850250244)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-1.7084633111953735, y=1.2556174993515015, z=3.0814192295074463)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-2.873324155807495, y=2.111720561981201, z=5.182386875152588)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-0.9825876355171204, y=0.72214275598526, z=1.7722152471542358)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-0.0, y=0.0, z=0.0)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-2.370492696762085, y=1.7564494609832764, z=4.2754693031311035)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-5.3529887199401855, y=3.9821012020111084, z=9.772500991821289)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-4.4880242347717285, y=3.318418264389038, z=8.14375114440918)
[gesture_detection_inference_on_oakd-2] geometry_msgs.msg.Point(x=-2.341932535171509, y=1.7278892993927002, z=4.2754693031311035)

The resulting pose is also very noisy, so i suspect that there is something wrong with it.

I have a working solution with this example:

https://github.com/luxonis/depthai-python/blob/main/examples/SpatialDetection/spatial_tiny_yolo.py

Here, the pipeline is obviously created manually (rather than the generic ros driver pipeline), which works rather good. The same model outputs reasonable xyz coordinates (in mm because its taken directly from the output) for

Minimal Reproducible Example

Start

ros2 launch depthai_ros_driver camera.launch.py camera_model:=OAK-D params_file:=$HOME/duckbrain_umbrella/ros2_ws/src/depthai-ros/depthai_ros_driver/config/rgbd.yaml

and watch the output of ros2 topic echo /oak/nn/spatial_detections while detecting something with the camera

Expected behavior

I would expect outputs like in this example

I ran the example like this:

python3 spatial_tiny_yolo.py

Position Log (x y z) while Im sitting ~50cm in front of the camera

0.1027177734375 0.04964692306518555 0.4928494567871094
0.09987322998046876 -0.17121124267578125 0.3734034118652344
-0.10212914276123047 0.017021522521972657 0.4900251159667969
0.1694300994873047 -0.2740165710449219 0.6021787719726562
-0.09702268981933594 0.015319369316101073 0.4900251159667969
0.09479540252685546 -0.16431202697753905 0.36386972045898436
-0.09282048797607421 0.028689970016479494 0.4858487548828125
0.10459590911865234 -0.16762165832519532 0.386046875
-0.04489921951293945 0.01726892852783203 0.4971475830078125
-0.07535162353515625 0.06658980560302734 0.5044801330566406
-0.043298191070556644 0.022515058517456055 0.49859698486328125
-0.04516178512573242 0.017369916915893555 0.5000548706054687
-0.04006796646118164 0.020905023574829103 0.5015213012695312
-0.02803781509399414 0.02979017448425293 0.5044801330566406
-0.03023613929748535 0.037350521087646485 0.5120322265625
-0.0316358642578125 0.03515095520019531 0.5059726867675781
-0.032600372314453126 0.03260036849975586 0.521398681640625
-0.03023613929748535 0.030236135482788085 0.5120322265625
-0.03032693862915039 0.026759061813354492 0.5135698852539062
-0.02767319107055664 0.025828311920166016 0.5311141967773437
-0.025748346328735353 0.025748346328735353 0.5294698486328125

Can someone tell me why there is such a difference between the output quality? It seems like a bug to me.

I also tried setting the following parameters in the .yaml config (to make the pipeline more similar to the example)

stereo:
      i_height: 416
      i_width: 416
      i_align_depth: true

left:
      i_resolution: 400P

right:
      i_resolution: 400P

Jun 18 '24 18:06 kikass13

i tested this:

ros2 launch depthai_examples tracker_yolov4_spatial_node.launch.py
ros2 launch depthai_examples yolov4_publisher.launch.py spatial_camera:=true

and the results look fine as well.

its only with the camera.cpp generic pipeline where the results are bad

Jun 19 '24 08:06 kikass13

Hi, thanks for the report, could you try testing with following parameters:

    nn:
      i_disable_resize: false
    rgb:
      i_preview_size: 416

Jun 21 '24 07:06 Serafadam

@Serafadam

thanks for your reply:

im using this config:

/oak:
  ros__parameters:
    ### will be added via launchfile due to me not wanting to fix the path here
    nn:
      i_nn_config_path: PLACEHOLDER_PATH_TO_CONFIG_JSON_WHICH_WILL_BE_REPLACED_BY_LAUNCH_CONFIG
      i_enable_passthrough: true
      i_enable_passthrough_depth: true
      i_disable_resize: false

    camera:
      i_nn_type: spatial
      i_pipeline_dump: true
      i_enable_ir: true
    
    rgb:
      i_fps: 10.0
      i_resolution: 720P
      i_preview_size: 416

      
    stereo:
      i_align_depth: true
      
      i_height: 320
      i_width: 320  

      # i_stereo_conf_threshold: 40
      i_stereo_conf_threshold: 200
      i_subpixel: true
      i_depth_preset: HIGH_DENSITY ###Prefers density over accuracy. Less invalid depth values, but more outliers.
      i_lr_check: true ### Left-Right Check or LR-Check is used to remove incorrectly calculated disparity pixels due to occlusions at object borders (Left and Right camera views are slightly different).
      i_lrc_threshold: 10
      i_fps: 10.0
      i_align_depth: true

      ### added filter
      i_enable_decimation_filter: true
      i_decimation_filter_decimation_mode: NON_ZERO_MEDIAN ### "PIXEL_SKIPPING", "NON_ZERO_MEDIAN", "NON_ZERO_MEAN"
      i_decimation_filter_decimation_factor: 4 ### default 1, max 4
      
      i_enable_spatial_filter: true
      i_spatial_filter_hole_filling_radius: 2
      i_spatial_filter_alpha: 0.5
      i_spatial_filter_delta: 20
      i_spatial_filter_iterations: 1

      i_enable_threshold_filter: true
      i_threshold_filter_min_range: 400
      i_threshold_filter_max_range: 10000

      i_enable_speckle_filter: true
      i_speckle_filter_speckle_range: 50

    left:
      i_publish_topic: false
      i_fps: 10.0

    right:
      i_publish_topic: false
      i_fps: 10.0

i_disable_resize: false and i_preview_size: 416 did not work, the resulting spatial info is still bad.

i have re-written one of the examples (depthai_examples/yolov4_spatial_publisher.cpp) and have nearly 1:1 hardcoded all the yaml parameters (from my config above) into the c++ pipeline. The resulting node works fine (spatial information is correct). That's why i assume that I have configured something wrong, or something in the pipeline is not created correctly (inside the camera.cpp driver)

Jun 21 '24 11:06 kikass13

Hi, I'm also observing this (with the latest build of this repo; commit #9132443, Oak-D-Lite and Humble). I noticed the pose values look sometimes correct, but more often are all over the place as @kikass13 describes. Started digging in a bit and noticed that geometry_msgs/Point.msg uses float64 while your Point3f uses float32. I think this might be the culprit and explain the weird behavior. But maybe I'm wrong.

Oct 16 '24 06:10 mirek-burkon

Hi, are the results the same when running through bare C++/Python code?

Oct 18 '24 11:10 Serafadam

@Serafadam My mistake, it surely wasn't the float issue. Python examples work fine (I wrote my own much simpler Python ROS wrapper to test that properly), and I traced this down to two issues:

nn.i_disable_resize must be set to True, otherwise I get faulty results (see video below)
when mapping results to the URDF, the Y coordinate of the 3d pose needs to be inverted, because the oak_rgb_camera_optical_frame's Y is pointing downwards in your model. I did it in SpatialDetectionConverter.cpp, line 135

Using this config with Oak-D Lite and the default Yolo v4:

camera:
  i_enable_imu: True
  i_enable_sync: False
  i_nn_type: spatial
  i_pipeline_type: RGBD
rgb:
  i_synced: False
  i_low_bandwidth: True
  i_low_bandwidth_profile: 1
  i_low_bandwidth_quality: 100
  i_publish_topic: True
  i_publish_compressed: True
  i_enable_preview: True
  i_preview_size: 416
  i_preview_width: 416
  i_fps: 30.0
  i_enable_spatial_nn: True
stereo:
  i_synced: False
  i_subpixel: True
  i_publish_topic: True
  i_publish_compressed: False
  i_enable_preview: False
  i_enable_feature_tracker: False
  i_fps: 30.0
nn:
  i_disable_resize: True # MUST BE TRUE!
  i_nn_config_path: depthai_ros_driver/yolo

Launching with: ros2 launch depthai_ros_driver camera.launch.py params_file:=/ws/config/rgb-depth-yolo.yaml camera_model:=OAK-D-LITE cam_pos_z:=1.15 rectify_rgb:=False

See https://www.youtube.com/watch?v=MP_fEaKpsgI (pose flipped upside down in both cases)

Oct 29 '24 22:10 mirek-burkon