deep-high-resolution-net.pytorch icon indicating copy to clipboard operation
deep-high-resolution-net.pytorch copied to clipboard

How to get person detection?

Open FrancescoPiemontese opened this issue 5 years ago • 15 comments

First of all thank you for your excellent work. I have a question regarding person detection. In your paper it is mentioned that you use a person detector before feeding its output to the HRNet. Am I supposed download this separately and then feed its output to the HRNet? If so, what do the dataloaders in train and test.py do? Would it be possible for you to tell me which person detector has been used?

FrancescoPiemontese avatar Apr 15 '19 11:04 FrancescoPiemontese

I think the author used the detection information from the dataset.(mpii dataset json file 'center' 'scale')

njustczr avatar Apr 16 '19 03:04 njustczr

@FrancescoPiemontese maybe you can refer to this myhrnet, I integrated yolo human detection.

lxy5513 avatar Apr 18 '19 02:04 lxy5513

Thank you! I will try

FrancescoPiemontese avatar Apr 18 '19 10:04 FrancescoPiemontese

@lxy5513 , will you consider make a PR to this repo?

leoxiaobin avatar Apr 19 '19 06:04 leoxiaobin

@leoxiaobin yes, soon after, I will add several human detection, like R-FCN, RetineNet, then do PR and speed description.

lxy5513 avatar Apr 19 '19 06:04 lxy5513

@leoxiaobin I prepare to do this track by your simple-baseline paper description

For the processing frame in videos, the boxes from a human detector and boxes generated by propagating joints from previous frames using optical flow are unified using a bounding box Non-Maximum Suppression (NMS) operation

I have two group boxes, but I don't how to do NMS, because boxes generated by flownet2S, which have no confidence scoces, could I can default think the score is previous frame boxes scors ? Could you tell me the problem, thank you advance.

lxy5513 avatar Apr 19 '19 06:04 lxy5513

We actually use the OKS score for NMS.

leoxiaobin avatar Apr 19 '19 15:04 leoxiaobin

Thanks

lxy5513 avatar Apr 23 '19 01:04 lxy5513

@leoxiaobin Hi, I made a PR for yolov3-HRnet, however something wired. I use two ways.


ONE, I get dt_boxes from yolo then python tools/test.py TEST.USE_GT_BBOX False TEST.FLIP_TEST False, and get rid of oks nms, get result as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_hrnet 0.702 0.859 0.770 0.653 0.779 0.736 0.878 0.794 0.683 0.813

TWO, I use end-to-end two model(same model like ONE), and get keypoins, then save into json, finally I get result by official cocoEval.evaluate() , as follow :

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
hrnet 0.594 0.811 0.656 0.564 0.651 0.647 0.834 0.704 0.601 0.713

Could you please tell why the two results is so different, Thank you in advance

lxy5513 avatar Apr 25 '19 11:04 lxy5513

this is my script https://github.com/lxy5513/hrnet/blob/master/tools/eval.py which get keypoints json file , by the way, my YOLOv3 threshold is 0.1

lxy5513 avatar Apr 25 '19 11:04 lxy5513

I have a very quick look through your code. I have two questions.

  1. It seems that you do not convert image's channel to RGB. Opencv reads image as BGR channel. Our model are trained using RGB channel. So you need first convert your image data to RGB channel like line131 at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/lib/dataset/JointsDataset.py#L131.

  2. Are the threshold for both metheds same?

leoxiaobin avatar Apr 25 '19 16:04 leoxiaobin

I am greatly appreciate for your attention this is my convert channel code: https://github.com/lxy5513/hrnet/blob/master/tools/eval.py#L142 .

this is my relative threshold code, they are same for two methods. https://github.com/lxy5513/hrnet/blob/master/tools/eval.py#L159

lxy5513 avatar Apr 26 '19 01:04 lxy5513

By the way, my use yolov3 + simple-baseline pose model , test the PR, it seem normal, as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
Simple-baseline 0.648 0.856 0.708 0.617 0.706 0.697 0.880 0.750 0.652 0.763

lxy5513 avatar Apr 26 '19 09:04 lxy5513

I would say this issue can be closed with #161 being merged

alex9311 avatar Feb 03 '20 21:02 alex9311

@leoxiaobin Hi, I made a PR for yolov3-HRnet, however something wired. I use two ways.

ONE, I get dt_boxes from yolo then python tools/test.py TEST.USE_GT_BBOX False TEST.FLIP_TEST False, and get rid of oks nms, get result as follow:

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L) pose_hrnet 0.702 0.859 0.770 0.653 0.779 0.736 0.878 0.794 0.683 0.813 TWO, I use end-to-end two model(same model like ONE), and get keypoins, then save into json, finally I get result by official cocoEval.evaluate() , as follow :

Arch AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L) hrnet 0.594 0.811 0.656 0.564 0.651 0.647 0.834 0.704 0.601 0.713 Could you please tell why the two results is so different, Thank you in advance

image i also get 0.702,but the implementation about w32_256*192 is 0.744,why?i just run the implementaion code with the trained model pose_hrnet_w32_256x192.pth.can you help me?

zhanghao5201 avatar Mar 24 '21 14:03 zhanghao5201