open_model_zoo human-pose-estimation-0004 architecture openpose does not run

The new human-pose-estimation-000(2-4) models do not work when architecture is set to openpose. They run when architecture is set to associative embedding, but the accuracy is very poor - much worse than the human-pose-estimation-0001 model. Given that they have a higher mAP, I think there must be some bug here.

When I run the demo with python human_pose_estimation.py -i video.avi -m human-pose-estimation-0004.xml -d CPU -at openpose, I get the following error:

Traceback (most recent call last):
  File "human_pose_estimation.py", line 285, in <module>
    sys.exit(main() or 0)
  File "human_pose_estimation.py", line 185, in main
    poses, scores = hpes[mode].postprocess(raw_outputs, frame_meta)
  File "/home/walt/workspace/libs/open_model_zoo/demos/python_demos/human_pose_estimation_demo/human_pose_estimation_demo/model.py", line 209, in postprocess
    poses, scores = self.decoder(heatmaps, nms_heatmaps, pafs)
  File "/home/walt/workspace/libs/open_model_zoo/demos/python_demos/human_pose_estimation_demo/human_pose_estimation_demo/decoder_openpose.py", line 49, in __call__
    pafs = np.transpose(pafs, (0, 2, 3, 1))
  File "<__array_function__ internals>", line 5, in transpose
  File "/home/walt/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 650, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/home/walt/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
    return bound(*args, **kwds)
ValueError: axes don't match array

If I manually set the input height to 368 : python human_pose_estimation.py -i video.avi -m human-pose-estimation-0004.xml -d CPU -at openpose --tsize 368 I get the following error:

Traceback (most recent call last):
  File "human_pose_estimation.py", line 285, in <module>
    sys.exit(main() or 0)
  File "human_pose_estimation.py", line 259, in main
    hpes[mode](frame, next_frame_id, {'frame': frame, 'start_time': start_time})
  File "/home/walt/workspace/libs/open_model_zoo/demos/python_demos/human_pose_estimation_demo/human_pose_estimation_demo/model.py", line 134, in __call__
    self.reshape_net(inputs)
  File "/home/walt/workspace/libs/open_model_zoo/demos/python_demos/human_pose_estimation_demo/human_pose_estimation_demo/model.py", line 71, in reshape_net
    self.net.reshape(input_shapes)
  File "ie_api.pyx", line 1437, in openvino.inference_engine.ie_api.IENetwork.reshape
RuntimeError: Check 'PartialShape::broadcast_merge_into( pshape, node->get_input_partial_shape(i), autob)' failed at ngraph/core/src/op/util/elementwise_args.cpp:49:
While validating node 'v1::Add 2056 (2023[0]:f32{1,21,92,164}, 2055/Interpolate[0]:f32{1,21,96,168}) -> (f32{?,21,92,164})' with friendly_name '2056':
Argument shapes are inconsistent.

Dec 22 '20 08:12 jpapon

@jpapon the python human_pose_estimation_demo updated in OpenVINO 2021.2 in order to support both old human-pose-estimation-0001 model, which according to the model description, based on the OpenPose approach) and new human-pose-estimation models, which were added in OpenVINO 2021.2 release. They are using different approach, based on the EfficientHRNet approach (that follows the Associative Embedding framework), as it was stated in model description, see human-pose-estimation-0002 for example. Because of that difference, demo need command line option to distinguish the way how to process model outputs. So, it is expected and normal that demo report error when you use wrong '-at' value, like specify 'openpose' model architecture for the model based on embeddings.

When '-at ae' option is specified for the model human-pose-estimation-0002 the demo output look like screenshot below (seems acceptable from pose estimation accuracy point of view)

Dec 22 '20 09:12 vladimir-dudnik

Hi @vladimir-dudnik, I kinda agree with the point taken by @jpapon that although it was stated in the documentation that the EfficentHRNet models have higher mAP than the OpenPose model, it seems like OpenPose still perform better on some initial testing that I have conducted. OpenPose Screen Shot 2021-01-15 at 10 44 42 PM

EfficientHRNet - 4th model Screen Shot 2021-01-15 at 10 44 09 PM

Jan 15 '21 14:01 renziver

@renziver Thanks for bringing this up.

A few notes on this:

Lightweight OpenPose model (human-pose-estimation-0001) is still available, if you find it more accurate for a particular use case, you may still use it.
Cross domain transferability of models is a tricky topic. Docs are referring to the metrics obtained on MS COCO val AP for the models trained on the training subset of the same dataset. Code for metrics computation and models testing in general is available as a part of Accuracy Checker. Let us know if you find some inconsistencies there.
AP (a.k.a. MS COCO AP, or average mean average precision, or average precision) accounts errors of many different types. Hence improvement/degradation in different aspects of models' predictions may contribute differently to overall AP values. So higher AP value don't guarantee that model is better in all aspect.
Don't forget to try different confidence threshold values for different algos.
In general Associative Embedding like networks have less restrictions on the merge of far away joints into a single pose, which might be the main cause of errors that can be observed in the images you've shared. I believe it can be fixed to some extent by putting some heuristics to the post-processing code.

Jan 15 '21 15:01 druzhkov-paul

Thanks @druzhkov-paul , just another question, why is the human-pose-estimation-0001 has lower latency (68ms) despite having 15.435 GFlops which is higher than human-pose-estimation-0004 (385.2ms) 14.3707 Gflops?

Jan 19 '21 04:01 renziver

Thanks @druzhkov-paul , just another question, why is the human-pose-estimation-0001 has lower latency (68ms) despite having 15.435 GFlops which is higher than human-pose-estimation-0004 (385.2ms) 14.3707 Gflops?

@renziver

First, FLOPs are given for networks that are run via OpenVINO and you can estimate the pure inference time using the benchmark app. At the same time, demo application does not only do the forward pass through the network, but also some post-processing and visualization, that affect the overall latency/throughput. Post-processing for the two types of networks has different cost: both output some intermediate information that has to be further processed to obtain final pose predictions, but they use different means for that.

Second, these networks are fully-convolutional, meaning that they can be naturally adapted for any input resolution. Standard evaluation protocol resizes images to some base scale keeping the original aspect ratio unchanged, hence input resolution depends on the image, rather than just a network. And FLOPs are given for some reference resolution, which might differ from the one, that you actually run the network on. The good thing is that one have a flexibility to adjust input resolution (trading off accuracy for latency) for particular use-case. You may try tweaking input resolution using the --tsize argument.

And one more thing to mention is that if you are targeting lower latency, I'd recommend to switch to the MIN LATENCY mode instead of using USER_SPECIFIED one.

Jan 19 '21 09:01 druzhkov-paul

Thanks @druzhkov-paul, I think that it's really on the 2nd point you mentioned above about cross domain transferability. It seems like it performs poorly on high angle view i.e. cctv, etc. Do you have any plans on releasing the training code for your EfficientHRNet implementation? The OpenVino training extension only supports Lightweight OpenPose as of the moment.

Jan 21 '21 06:01 renziver

@renziver, yes we have plans to release the training code for this kind of models.

Jan 22 '21 14:01 druzhkov-paul

open_model_zoo open_model_zoo copied to clipboard

human-pose-estimation-0004 architecture openpose does not run

open_model_zoo
open_model_zoo copied to clipboard