unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Textdiffuser

Open fff518 opened this issue 1 year ago • 1 comments

**noisy_residual = unet(input, t, encoder_hidden_states[:args.vis_num], masked_feature=masked_features[:16], feature_mask=feature_masks[:16], segmentation_mask=segmentation_masks[:16]).sample File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 659, in forward return model_forward(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py", line 647, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, kwargs) File "/data/pylib/diffusers/models/unet_2d_condition.py", line 595, in forward sample = torch.cat([sample, feature_mask, masked_feature], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 1 for tensor number 1 in the list.

In your train.py, noisy_residual = unet(input, t, encoder_hidden_states[:args.vis_num], masked_feature=masked_features[:16], feature_mask=feature_masks[:16], segmentation_mask=segmentation_masks[:16]).sample exsits a dimensional mismatch problem ,I do not understand why this is related to args.vis_num. All my training parameters are the same as the example you gave me. I hope you can explain this to me, thank you very much!

fff518 avatar Dec 27 '23 14:12 fff518

Thanks for your interest in TextDiffuser. Could you print the size of sample, feature_mask, masked_feature?

JingyeChen avatar Dec 27 '23 14:12 JingyeChen