Transition between generated gesture
The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone. MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to:
- Our approach is to add seed poses for smooth transitions.
- Its follow-up work PriorMDM uses DoubleTake for long motion generation.
- EDGE enforces temporal consistency between multiple sequences.
Hi, thanks for the great work!
Regarding your approach:
I am wondering if this is a bug in sample.py when smoothing the transitions here:
As you have commented yourself, the size of varaible last_poses is (1, model.njoints, 1, args.n_seed), so len(last_poses) is always 1. I think len(last_poses) should be replaced with np.size(last_poses, axis=-1) which is args.n_seed (30 frames by default). This way, it combines the first frames of the new prediction with the last frames of previous prediction, something like this:
for j in range(np.size(last_poses, axis=-1)): n = np.size(last_poses, axis=-1) prev = last_poses[..., j] next = sample[..., j] sample[..., j] = prev * (n - j) / (n + 1) + next * (j + 1) / (n + 1)
Am I right? Would appreciate your feedback. Thanks a lot
Yes, when I reproduced it later I remembered that there was a minor problem in this region, but it didn't seem to have much effect on the results. Also:
- the length of
last_posesis not 1, butn_seed, where the first 1 indicates the batch size and the second 1 extends the dimensions, which has no real meaning. - the follow-up DiffuseStyleGesture+ definitely fixed this, see: here.