How can we generate human movements over time
Thank you for your great work!
After reading the paper, one question I have is that it seems the current PoseGPT can work only on a single-frame basis and cannot generate a sequence of human motion. For instance, given the text "This person could be proposing marriage, traditionally done by kneeling on one knee to signify commitment and respect.", PostGPT currently can only output a SMPL pose of this motion, but not a series of SMPL poses depicting the whole proposing motion. Such limitation is also mentioned in the paper.
If so, I am wondering how we can potentially extend PostGPT to generate human movements over time? For instance, if we give a textual description (with some images), it can generate a series of SMPL poses. Welcome to any suggestion and discussion!
I’ve also been very interested in taking in sequences of pose data as input.
It’s my understanding that PoseGPT’s architecture sort of goes against this approach. If anyone else has figured this out PLEASE shoot me an email (on my GH profile !!)