Moore-AnimateAnyone About the future works

About the future works

Open huyduong7101 opened this issue 8 months ago • 0 comments

In the scope of human-related video generation, there are two main and emergent problems, namely, Talking Face Generation (TFG) and Human Animation Generation (HAG). The discrepancy between those problems is what inputs we feed into the models (I assume that models here are Diffusion-based):

For TFG, it is audio + image/video
For HAG, it is pose + image/video.

Hence, I wonder there are any studies now adopt an approach to merge two problems into one? If not, what are the obstacles now? (Data, Modeling, ...)

Jun 14 '24 04:06 huyduong7101

Moore-AnimateAnyone Moore-AnimateAnyone copied to clipboard

About the future works

Moore-AnimateAnyone
Moore-AnimateAnyone copied to clipboard