Yupan Huang
Yupan Huang
It is hard to locate the cause of errors and debug without error stack traces. It would be helpful to provide more information about your running command, your inference task,...
According to the message of ``` attention_scores = attention_scores + attention_mask RuntimeError: The size of tensor a (397) must match the size of tensor b (200) at non-singleton dimension 3...
Please follow [the instructions](https://github.com/microsoft/unilm/blob/master/layoutreader/README.md#run). Specifically, you can download the data refer to step 1 (`wget https://layoutlm.blob.core.windows.net/readingbank/dataset/ReadingBank.zip`), extract the data (`unzip ReadingBank.zip` and you should get the test data `ReadingBank/test`), and...
1. `ModuleNotFoundError: No module named 'layoutlmft'`: please try `pip install -e .` following the [installation instruction](https://github.com/microsoft/unilm/tree/master/layoutlmv3#installation). 2. `Not Found for url...`: please manually download `model_final.pth` from `https://huggingface.co/HYPJUDY/layoutlmv3-base-finetuned-publaynet/` to your local...
I am not sure how to "get directly the text representation besides the tensor information related to bounding boxes", but it is easy to get text segments/lines by OCR engines...
Hi, has your problem been solved? Have you run the example code to see if training the model on PubLayNet with GPUs works? I have not tried training with CPU...
This problem seems to be caused by some position_ids being larger than the embedding size. I suggest you find the exact sample causing this problem and analyze its minimum and...
Currently, LayoutLMv3 in Transformers does not support object detection ([see @NielsRogge's reply below](https://github.com/huggingface/transformers/pull/17060#issuecomment-1132626756)). > unfortunately I'm (for now) not planning to add the object detection part, because the framework being...
`MODEL.IMAGE_ONLY: True` means only image (but not text) information is used. See also: https://github.com/microsoft/unilm/issues/813#issuecomment-1210045982
1. I am not aware of such models. I think that with proper design, the inclusion of both inputs might improve the results. You can try it if interested. 2....