HybridPose
HybridPose copied to clipboard
Some problems about this network model
1.How to improve the effect of serious occlusion? What are the suggestions? 2.Is the network effective for non rigid objects? 3.What is the effect of multi-target detection in complex scenes? Thank you very much.
Edit: After reconsideration, I would like to update my answer below to the third question.
HybridPose can be extended to handling multiple objects within an image. One approach is to predict instance-level rather than semantic-level segmentation masks. Instance segmentation is a well-studied problem, which can be solved by methods such as Mask R-CNN. We can then extract intermediate representations from each instance, and feed them to the pose regression module described in the paper.
Original Reply: Thank you for your interest in our research!
- Our experiments show that a hybrid of multiple intermediate representations works robustly, even under severe occlusion. Examples under severe occlusion include (e) and (g) in Figure 3 of our paper. If you use a different dataset other than Linemond and Occlusion Linemod, and encounter difficulties in handling examples with severe occlusion, please elaborate on the details of your setup. We can try to solve the problem together.
- We have only tested HybridPose on rigid objects. However, we expect that a hybrid of different intermediate representations is going to help in the non-rigid pose estimation problem too. Many non-rigid objects are not symmetric, and therefore using symmetric correspondences may not be very easy. Yet it should be natural to extend any keypoint-based non-rigid pose estimation pipeline by including edge vectors.
- We have not tested HybridPose on concurrent multi-target detection problems. A different feature prediction network is trained on each category of objects. There are two types of multi-target detection problems. In the first type of problem, an image contains multiple categories of objects, but each category of objects has either zero or one occurrence. It is natural for HybridPose to handle this type of problem. You can change network structure so that it outputs a multi-class segmentation mask instead of a binary segmentation mask. Then you will need to extract the intermediate predictions on a class-by-class basis. If you would like to try this, we welcome pull requests and/or discussion of experimental results. The second type of problem involves multiple instances of the same object in one image. This type of problem is more challenging, and you may try separating different instances by analyzing the distribution of keypoint votes.
I hope this helps.