How does this work compare to ATEN?
Hi, I'm looking at both your work and ATEN. The conclusions from both your papers are very similar.
In this paper, we presented a novel detection-free Part Grouping Network to investigate instance-level human parsing, which is a more pioneering and challenging work in analyzing human in the wild. To push the research boundary of human parsing to match real-world scenarios much better, we further introduce a new large-scale (...) Experimental results on PASCAL-Person-Part [6] and our CIHP dataset demonstrate the superiority of our proposed approach, which surpasses previous methods for both semantic part segmentation and edge detection tasks, and achieves state-of-the-art performance for instance-level human parsing.
In this work, we investigate video instance-level human parsing that is a more pioneering and realistic task in analyzing human in the wild. To fill the blank of video human parsing data resources, we further introduce a large-scale (...) Experimental results on DAVIS [36] and our VIP dataset demonstrate the superiority of our proposed approach, which achieves state-of-the-art performance on both video instance-level human parsing and video segmentation tasks.
I'm wondering - which produces better accuracy, this work or ATEN? Considering that both claim "more pioneering", "demonstrate the superiority of our proposed approach", and "achieve state-of-the-art", can you help explain the differences? I'm not clear which I should use. Thanks!
For difference, PGN is totally a framework for parsing images while ATEN aims at utilizing temporal information for video parsing. For frame-level accuracy, PGN is better while for sequence input, ATEN is state-of-the-art.
Okay, thanks! Do you know which performs inference faster?