DiffuseStyleGesture icon indicating copy to clipboard operation
DiffuseStyleGesture copied to clipboard

Cannot reproduce the DiffuseStyleGesture+ inference for BEAT

Open ylhua opened this issue 1 year ago • 2 comments

Hi, when I used my own audio and text to test the DiffuseStyleGesture+ model, I had this error 0 3 0%| | 0/1000 [00:00<?, ?it/s] Traceback (most recent call last): File "sample.py", line 342, in main(config, config.save_dir, config.model_path, tst_path=config.tst_path, max_len=config.max_len, File "sample.py", line 267, in main inference(args, save_dir, filename, textaudio, sample_fn, model, n_frames=max_len, smoothing=True, File "sample.py", line 142, in inference sample = sample_fn( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 643, in p_sample_loop for i, sample in enumerate(self.p_sample_loop_progressive( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 722, in p_sample_loop_progressive out = sample_fn( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 527, in p_sample out = self.p_mean_variance( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 92, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, **kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 308, in p_mean_variance model_output = model(x, self._scale_timesteps(t), **model_kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 129, in call return self.model(x, new_ts, **kwargs) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../model/mdm.py", line 145, in forward embed_style = self.mask_cond(self.embed_style(y['style']), force_mask=force_mask) # (bs, 64) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x17 and 2x384)

ylhua avatar Jun 27 '24 06:06 ylhua

Just solved it. It seems like the size of the style vector was hard-coded for different datasets.

ylhua avatar Jun 27 '24 08:06 ylhua

But I have one question. The result of DiffuseStyleGesture+ is more plain than the DiffuseStyleGesture's. Almost only the hands are moving and the body is moving around within a small region. Do you have some idea about this?

ylhua avatar Jun 27 '24 08:06 ylhua

Thanks for your feedback! Yes, that's expected — the original DiffuseStyleGesture was trained on the ZEGGS dataset, which features more exaggerated and expressive body movements. In contrast, DiffuseStyleGesture+ was trained on BEAT and TWH datasets, where the motion is naturally more subtle and restrained. Let me know if you have any other questions!

YoungSeng avatar Jun 25 '25 14:06 YoungSeng