affordance_diffusion
affordance_diffusion copied to clipboard
predict hand poses
Thank you for your wonderful work! I would like to ask about the "Our method outperforms generic image generation baselines, and the extracted hand poses from our HOI synthesis are favored in user studies against baselines that are trained to directly predict hand poses.“ What does this phrase mean?Because I think using your method to do "predict hand poses" is a reasonable thing to do. What do you think is the difference between the two tasks and why do you come to this conclusion?Or what's difficult about using your method to "predict hand poses"?