depthai icon indicating copy to clipboard operation
depthai copied to clipboard

Allow Returning Estimated XYZ Position (with padding_factor) from Key Points

Open Luxonis-Brandon opened this issue 5 years ago • 0 comments

Start with the why:

When running monocular neural inference inference with a model that returns Key-Points (e.g. body pose estimation like OpenPose) instead of bounding boxes (like MobileNetv2-SSD or tiny-YOLOv3), the only existing way to get the physical position of these key points is to request the entire depth map on the host and then look up the depth position based on the key-point location, and then reproject these depth measurements into XYZ locations of the Key-Points.

This has several disadvantages:

  1. The whole depth map needs to be transferred per frame (whereas only a tiny subset of that data is needed), which results in significant USB load and host-CPU load (when a small CPU is used, like a Pi Zero, which also only has USB2).
  2. The re-projection math is left to the user to figure out on the host. (We could do an example, though)

Move to the how:

For object detectors, we use a padding_factor which is a subset of the bounding box (a portion of the middle, configurable via the API) to allow the estimated XYZ position of the object to be returned.

For networks that return Key-Points instead of bounding boxes, we can do a similar approach. In this case, the padding_factor can be a region of pixels (a square) that is larger than the single-pixel Key-Point (as to help prevent the probability that the single-pixel Key-Point is on a hole in the disparity map).

(Later, we could also support returning the depth from this padding_factor, just as planned in https://github.com/luxonis/depthai/issues/125 for object detectors, so that more sophisticated techniques of depth-filtering over this area could be applied on the host.)

Move to the what:

Implement the corollary of returning the XYZ of Key-Points to what we have now for bounding boxes, including the programmable padding_factor for averaging the z-dimension over some programmable number of pixels.

Luxonis-Brandon avatar Sep 28 '20 17:09 Luxonis-Brandon