mmpose
mmpose copied to clipboard
About TopDownAffineTransform and padding
Hello,
I'm trying to understand the data processing you're doing and and I'm a but confused regarding the pad for the TopDown approach. You mention in the #1686 that you pad the bounding box by the factor of 1.25. Where does this number come from?
If I understand correctly (as you hinted in #617), it is from AlphaPose and their Part-Guided Proposal Generator (PGPG) where they offset the bounding box RANDOMLY from a uniform distribution. Is your implementation just subsiding the random pad with a fixed one? Why so?
Second, the padding in PGPG means enlarging the bounding box (BB) and taking slightly larger part of the image. The zero-padding happenes only when the BB is so big you cannot take bigger part of the image (like when the whole picture is in the BB), right?
Third, you pad images for both train and test. According to the AlphaPose paper, the PGPG is employed only for the training phase to bridge the domain gap between detected BBs And ground-truth BBs. Can you please comment on that?
Any link or redirection to more literature would be also helpful.
Thank you for your help and this great repo.