why not use ocr result in Document Layout training

Open xulangping opened this issue 3 years ago • 1 comments

Describe Model I am using LayoutLMv3 to do Document Layout training, I found that you have not used the ocr result, only the image. Could you please tell me why？

Aug 03 '22 07:08 xulangping

We have some explanations in the paper's Section 3.4:

To demonstrate the generality of LayoutLMv3 from the multimodal domain to the visual domain, we transfer LayoutLMv3 to a document layout analysis task. We model this task as an object detection problem without text embedding, which is effective in existing works [14, 30, 59].

Aug 10 '22 01:08 HYPJUDY