Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

Shape mismatch error occurs in multiprocessing

Open cmh1027 opened this issue 1 year ago • 2 comments

If I use 2~ gpus on inference, following error occurs.

Traceback (most recent call last):
  File "/hub_data1/minhyuk/diffusion/opensora/scripts/inference.py", line 114, in <module>
    main()
  File "/hub_data1/minhyuk/diffusion/opensora/scripts/inference.py", line 95, in main
    samples = scheduler.sample(
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/__init__.py", line 72, in sample
    samples = self.p_sample_loop(
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 434, in p_sample_loop
    for sample in self.p_sample_loop_progressive(
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 485, in p_sample_loop_p
rogressive
    out = self.p_sample(
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 388, in p_sample
    out = self.p_mean_variance(
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 94, in p_mean_variance
    return super().p_mean_variance(self._wrap_model(model), *args, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/gaussian_diffusion.py", line 267, in p_mean_variance
    model_output = model(x, t, **model_kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/respace.py", line 127, in __call__
    return self.model(x, new_ts, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/schedulers/iddpm/__init__.py", line 89, in forward_with_cfg
    model_out = model.forward(combined, timestep, y, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 267, in forward
    x = auto_grad_checkpoint(block, x, y, t0, y_lens, tpe)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/acceleration/checkpoint.py", line 24, in auto_grad_checkpoint
    return module(*args, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/models/stdit/stdit.py", line 111, in forward
    x = x + self.cross_attn(x, y, mask)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/minhyuk/.conda/envs/opensora/lib/python3.10/site-packages/opensora/models/layers/blocks.py", line 313, in forward
    kv = self.kv_linear(cond).view(B, -1, 2, self.num_heads, self.head_dim)
RuntimeError: shape '[4, -1, 2, 16, 72]' is invalid for input of size 105523

I tested on 2/3/4 gpus, and all give the same error.

cmh1027 avatar Mar 18 '24 07:03 cmh1027

我有相同的问题

Jianrong-Lu avatar Mar 19 '24 11:03 Jianrong-Lu

"My solution is to change the 'batch_size' value on line 32 of 'configs/opensora/inference/16x512x512.py' to 1, and that resolves the error." 16x512x512.py

xiexiaoshinick avatar Mar 20 '24 11:03 xiexiaoshinick