unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Inference code for V3 without hugging face.

Open lalitr994 opened this issue 2 years ago • 3 comments

Anyone tried inference with v3 trained model via git. I am getting RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3 for the image which I passed to the model. @wolfshow

lalitr994 avatar Jun 13 '22 13:06 lalitr994

It is hard to locate the cause of errors and debug without error stack traces. It would be helpful to provide more information about your running command, your inference task, the exact error line, and the full error stack trace.

HYPJUDY avatar Jun 13 '22 13:06 HYPJUDY

batchd_cls = process_in_batches(data[start_pos: end_pos],image) File "form_test_new.py", line 129, in process_in_batches outputs = model(input_ids=batch_input_ids, bbox=bbox, attention_mask=batch_attention_mask,labels=None,images=image) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 1050, in forward images=images, File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 949, in forward Wp=Wp, File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 643, in forward rel_2d_pos=rel_2d_pos, File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 442, in forward rel_2d_pos=rel_2d_pos, File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 403, in forward rel_2d_pos=rel_2d_pos, File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/models/layoutlmv3/modeling_layoutlmv3.py", line 331, in forward attention_scores = attention_scores + attention_mask RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3 Processing - 8 / 1363 ^CTraceback (most recent call last): File "form_test_new.py", line 161, in image, size = load_image(img_path) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/layoutlmft/data/image_utils.py", line 26, in load_image image = torch.tensor(img_trans.apply_image(image).copy()).permute(2, 0, 1) # copy to make it writeable File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 297, in return lambda x: self._apply(x, name) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/fvcore/transforms/transform.py", line 291, in _apply x = getattr(t, meth)(x) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/detectron2/data/transforms/transform.py", line 121, in apply_image pil_image = Image.fromarray(img) File "/anaconda/envs/layoutlmv3/lib/python3.7/site-packages/PIL/Image.py", line 2945, in fromarray obj = obj.tobytes() KeyboardInterrupt @HYPJUDY

lalitr994 avatar Jun 14 '22 05:06 lalitr994

According to the message of

attention_scores = attention_scores + attention_mask
RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3

it seems that one of attention_scores or attention_mask does not take into count the image size, i.e. 197 = 14 x 14 + 1 = 397 - 200. For example, the error can occur if you input the image and text together into the model so the attention_scores has a size of 397 at dimension 3, but you do not consider the image to extend the size of attention_mask.

HYPJUDY avatar Jun 17 '22 06:06 HYPJUDY