DiffuseStyleGesture Cannot reproduce the DiffuseStyleGesture+ inference for BEAT

Hi, when I used my own audio and text to test the DiffuseStyleGesture+ model, I had this error 0 3 0%| | 0/1000 [00:00<?, ?it/s] Traceback (most recent call last): File "sample.py", line 342, in main(config, config.save_dir, config.model_path, tst_path=config.tst_path, max_len=config.max_len, File "sample.py", line 267, in main inference(args, save_dir, filename, textaudio, sample_fn, model, n_frames=max_len, smoothing=True, File "sample.py", line 142, in inference sample = sample_fn( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 643, in p_sample_loop for i, sample in enumerate(self.p_sample_loop_progressive( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 722, in p_sample_loop_progressive out = sample_fn( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 527, in p_sample out = self.p_mean_variance( File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 92, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, **kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 308, in p_mean_variance model_output = model(x, self._scale_timesteps(t), **model_kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 129, in call return self.model(x, new_ts, **kwargs) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../model/mdm.py", line 145, in forward embed_style = self.mask_cond(self.embed_style(y['style']), force_mask=force_mask) # (bs, 64) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x17 and 2x384)

Jun 27 '24 06:06 ylhua

Just solved it. It seems like the size of the style vector was hard-coded for different datasets.

Jun 27 '24 08:06 ylhua

But I have one question. The result of DiffuseStyleGesture+ is more plain than the DiffuseStyleGesture's. Almost only the hands are moving and the body is moving around within a small region. Do you have some idea about this?

Jun 27 '24 08:06 ylhua

Thanks for your feedback! Yes, that's expected — the original DiffuseStyleGesture was trained on the ZEGGS dataset, which features more exaggerated and expressive body movements. In contrast, DiffuseStyleGesture+ was trained on BEAT and TWH datasets, where the motion is naturally more subtle and restrained. Let me know if you have any other questions!

Jun 25 '25 14:06 YoungSeng