motion-diffusion-model
motion-diffusion-model copied to clipboard
We don't have to encode_text in each denoise timestep (but compute them at the beginning), do we?
https://github.com/GuyTevet/motion-diffusion-model/blob/8139dda55d90a58aa5a257ebf159b2ecfb78c632/model/mdm.py#L151C8-L151C8
class MDM(nn.Module):
......
def forward(self, x, timesteps, y=None):
"""
x: [batch_size, njoints, nfeats, max_frames], denoted x_t in the paper
timesteps: [batch_size] (int)
"""
bs, njoints, nfeats, nframes = x.shape
emb = self.embed_timestep(timesteps) # [1, bs, d]
force_mask = y.get('uncond', False)
if 'text' in self.cond_mode:
enc_text = self.encode_text(y['text'])
emb += self.embed_text(self.mask_cond(enc_text, force_mask=force_mask))
Yes, that's possible at inference (but not in training) and can accelerate performance. If you are interested, you can send us a pull request.
Yes, that's possible at inference (but not in training) and can accelerate performance. If you are interested, you can send us a pull request.
@GuyTevet I send a PR here: https://github.com/GuyTevet/motion-diffusion-model/pull/152/
Why is it not possible during training ?