unilm
unilm copied to clipboard
Inference code for V3 without hugging face.
Anyone tried inference with v3 trained model via git. I am getting RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3 for the image which I passed to the model. @wolfshow
It is hard to locate the cause of errors and debug without error stack traces. It would be helpful to provide more information about your running command, your inference task, the exact error line, and the full error stack trace.
batchd_cls = process_in_batches(data[start_pos: end_pos],image)
File "form_test_new.py", line 129, in process_in_batches
outputs = model(input_ids=batch_input_ids, bbox=bbox, attention_mask=batch_attention_mask,labels=None,images=image)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 1050, in forward
images=images,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 949, in forward
Wp=Wp,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 643, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 442, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 403, in forward
rel_2d_pos=rel_2d_pos,
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 331, in forward
attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3
Processing - 8 / 1363
^CTraceback (most recent call last):
File "form_test_new.py", line 161, in
According to the message of
attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3
it seems that one of attention_scores
or attention_mask
does not take into count the image size, i.e. 197 = 14 x 14 + 1 = 397 - 200.
For example, the error can occur if you input the image and text together into the model so the attention_scores
has a size of 397 at dimension 3, but you do not consider the image to extend the size of attention_mask
.