depthai_hand_tracker icon indicating copy to clipboard operation
depthai_hand_tracker copied to clipboard

Replace yolov5 as palm detection network

Open Cursky opened this issue 1 year ago • 1 comments

Hello,handslandmark detection network is perfect,so fast and so accurate,But because it relies on data sets that are bare hands without gloves.When the hands it detects wear gloves, the effect is not satisfactory,At the same time, according to the mediapipe key point detection process, the first step is to detect the palm position, so I want to replace the palm detection model with the effective palm detection with gloves,I chose yolov5 as the detection model. Your project is perfect, but it would be better if you could teach how to replace your own retrained model.So I hope you can give me some guidance, and I will show some ideas and replacement process here.

First:

I retrain wear gloves palm detection model by yolov5,I use yolov5-6.0 [https://github.com/ultralytics/yolov5/releases] ,and then I export the onnx model according to this oak document https://www.oakchina.cn/2022/01/22/yolov5-blob/. This is the conversion command: (i use yolov5s.model) python export.py --simplify --opset 12 --include onnx --batch-size 1 --imgsz 640 --weights yolov5s.pt

According to the explanation of the document, we are only interested in the last three convolution layers, so we add as SIGMOD

import onnx

onnx_model = onnx.load("plam.onnx")

conv_indices = []
for i, n in enumerate(onnx_model.graph.node):
  if "Conv" in n.name:
    conv_indices.append(i)

input1, input2, input3 = conv_indices[-3:]

sigmoid1 = onnx.helper.make_node(
    'Sigmoid',
    inputs=[onnx_model.graph.node[input1].output[0]],
    outputs=['output1_yolov5'],
)

sigmoid2 = onnx.helper.make_node(
    'Sigmoid',
    inputs=[onnx_model.graph.node[input2].output[0]],
    outputs=['output2_yolov5'],
)

sigmoid3 = onnx.helper.make_node(
    'Sigmoid',
    inputs=[onnx_model.graph.node[input3].output[0]],
    outputs=['output3_yolov5'],
)

onnx_model.graph.node.append(sigmoid1)
onnx_model.graph.node.append(sigmoid2)
onnx_model.graph.node.append(sigmoid3)

onnx.save(onnx_model, "plams.onnx")

and i use http://blobconverter.luxonis.com/ online model conversion. This is my command when converting: image

now I have plams.blob

Cursky avatar Aug 24 '22 06:08 Cursky

Please note that the mediapipe palm detection model is doing more than finding a bounding box around the palm. It also finds keypoints in the palm that are used to calculate the rotated rectangle around the whole hand. This rotated rectangle (actually it is a square) is bigger than the initial bounding box but most importantly oriented so that the wrist keypoint is always on the lower side. This is what is expected as input by the landmark model. You can visualize these keypoints by running : ./demo.py --no_lm then pressing the key "2". If the hands presented to the camera have always the same orientation (for instance when you raise your hands vertically), you can hardcode the rotation to bring the hand in the orientation expected by the landmark model (in the case of the vertical raised hand, a rotation is not even needed since the hand is already correctly oriented).

Another point I want to mention: currently this repo is not using the latest version of the palm detection model. There are 2 more recent versions (lite and full) of the palm detection, that I am not using because during my tests, I haven't found any noticeable accuracy improvement and the new models were slower (https://github.com/PINTO0309/tflite2tensorflow/issues/19#issuecomment-981671651). I suspect google has used the same dataset to train their models, so I wouldn't expect much improvement for the detection of hands with gloves, but maybe it is worth a try.

From your tests,, when wearing gloves, once the hand palm is detected, is the landmark model doing a good job ?

geaxgx avatar Aug 24 '22 11:08 geaxgx