Moore-AnimateAnyone icon indicating copy to clipboard operation
Moore-AnimateAnyone copied to clipboard

About the future works

Open huyduong7101 opened this issue 8 months ago • 0 comments

In the scope of human-related video generation, there are two main and emergent problems, namely, Talking Face Generation (TFG) and Human Animation Generation (HAG). The discrepancy between those problems is what inputs we feed into the models (I assume that models here are Diffusion-based):

  • For TFG, it is audio + image/video
  • For HAG, it is pose + image/video.

Hence, I wonder there are any studies now adopt an approach to merge two problems into one? If not, what are the obstacles now? (Data, Modeling, ...)

huyduong7101 avatar Jun 14 '24 04:06 huyduong7101