3DCrowdNet_RELEASE some question about process 2d datasets

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently? in addition, the cropped image may contain another person, won't this process bring ambiguity to the network？ thank you very much!

Jul 05 '22 07:07 yuchen-ji

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?

The cropping process is the same regardless of datasets. Even if you crop the closest person for 3d datasets, there could be other people in the cropped image. And actually MuCo, which you are referring to, does not have multiple person in one image originally. It synthesizes multiple real person images to one image using depths.

in addition, the cropped image may contain another person, won't this process bring ambiguity to the network？

That is the challenge of crowded scenes, which 3DCrowdNet resolves. Please see the paper.

Jul 05 '22 14:07 hongsukchoi

i found that you crop the closest person for 3d datasets if have multiple person in one image, but for 2d dataset you may crop all persons in one image. why you process these datasets differently?

The cropping process is the same regardless of datasets. Even if you crop the closest person for 3d datasets, there could be other people in the cropped image. And actually MuCo, which you are referring to, does not have multiple person in one image originally. It synthesizes multiple real person images to one image using depths.

in addition, the cropped image may contain another person, won't this process bring ambiguity to the network？

That is the challenge of crowded scenes, which 3DCrowdNet resolves. Please see the paper.

Thanks for your reply！ I found many up-to-down methods using single person datasets for training, these can prevent ambiguity during training, which the cropped image does not contain another person. while for inference, the cropped image often contain other persons, but it often regress the right person's smpl parameters. Does this mean that if other people are included in the cropped image during training, it will bring ambiguity to the network and make it difficult for training. In 3DCrowdNet, the cropped image contain other persons even for training. but add 2d robust pose heatmap to resolve the ambiguity. Is my understanding correct？

Jul 05 '22 14:07 yuchen-ji

Does this mean that if other people are included in the cropped image during training, it will bring ambiguity to the network and make it difficult for training.

No. I think you are confused with how deep learning works. Given accurate ground truth, a neural network becomes robust to the ambiguity during training. Then, the neural network performs better on those ambiguous input in test time.

Jul 06 '22 14:07 hongsukchoi

3DCrowdNet_RELEASE 3DCrowdNet_RELEASE copied to clipboard

some question about process 2d datasets

3DCrowdNet_RELEASE
3DCrowdNet_RELEASE copied to clipboard