DiffuseStyleGesture Transition between generated gesture

The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone. MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to:

Our approach is to add seed poses for smooth transitions.
Its follow-up work PriorMDM uses DoubleTake for long motion generation.
EDGE enforces temporal consistency between multiple sequences.

May 09 '23 15:05 YoungSeng

Hi, thanks for the great work!

Regarding your approach:

I am wondering if this is a bug in sample.py when smoothing the transitions here:

As you have commented yourself, the size of varaible last_poses is (1, model.njoints, 1, args.n_seed), so len(last_poses) is always 1. I think len(last_poses) should be replaced with np.size(last_poses, axis=-1) which is args.n_seed (30 frames by default). This way, it combines the first frames of the new prediction with the last frames of previous prediction, something like this:

for j in range(np.size(last_poses, axis=-1)): n = np.size(last_poses, axis=-1) prev = last_poses[..., j] next = sample[..., j] sample[..., j] = prev * (n - j) / (n + 1) + next * (j + 1) / (n + 1)

Am I right? Would appreciate your feedback. Thanks a lot

Oct 25 '23 18:10 sh-taheri

Yes, when I reproduced it later I remembered that there was a minor problem in this region, but it didn't seem to have much effect on the results. Also:

the length of last_poses is not 1, but n_seed, where the first 1 indicates the batch size and the second 1 extends the dimensions, which has no real meaning.
the follow-up DiffuseStyleGesture+ definitely fixed this, see: here.

Oct 27 '23 15:10 YoungSeng