Hi, when I used my own audio and text to test the DiffuseStyleGesture+ model, I had this error
0 3
0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "sample.py", line 342, in
main(config, config.save_dir, config.model_path, tst_path=config.tst_path, max_len=config.max_len,
File "sample.py", line 267, in main
inference(args, save_dir, filename, textaudio, sample_fn, model, n_frames=max_len, smoothing=True,
File "sample.py", line 142, in inference
sample = sample_fn(
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 643, in p_sample_loop
for i, sample in enumerate(self.p_sample_loop_progressive(
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 722, in p_sample_loop_progressive
out = sample_fn(
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 527, in p_sample
out = self.p_mean_variance(
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 92, in p_mean_variance
return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/gaussian_diffusion.py", line 308, in p_mean_variance
model_output = model(x, self._scale_timesteps(t), **model_kwargs)
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../diffusion/respace.py", line 129, in call
return self.model(x, new_ts, **kwargs)
File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/heyeyuanshan76/ling/DiffuseStyleGesture/BEAT-TWH-main/mydiffusion_beat_twh/../model/mdm.py", line 145, in forward
embed_style = self.mask_cond(self.embed_style(y['style']), force_mask=force_mask) # (bs, 64)
File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/heyeyuanshan76/.conda/envs/DiffuseStyleGesture/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x17 and 2x384)
Just solved it. It seems like the size of the style vector was hard-coded for different datasets.
But I have one question. The result of DiffuseStyleGesture+ is more plain than the DiffuseStyleGesture's. Almost only the hands are moving and the body is moving around within a small region. Do you have some idea about this?
Thanks for your feedback! Yes, that's expected — the original DiffuseStyleGesture was trained on the ZEGGS dataset, which features more exaggerated and expressive body movements. In contrast, DiffuseStyleGesture+ was trained on BEAT and TWH datasets, where the motion is naturally more subtle and restrained. Let me know if you have any other questions!