echomimic
echomimic copied to clipboard
Pose driven inference
Hi, how can i infer based on just pose, without audio at all (like face reenactment task)? Using all the landmarks, including mouth.
same problem
same, does it solved?
same problem
same problem