unilm
unilm copied to clipboard
why not use ocr result in Document Layout training
Describe Model I am using LayoutLMv3 to do Document Layout training, I found that you have not used the ocr result, only the image. Could you please tell me why?
We have some explanations in the paper's Section 3.4:
To demonstrate the generality of LayoutLMv3 from the multimodal domain to the visual domain, we transfer LayoutLMv3 to a document layout analysis task. We model this task as an object detection problem without text embedding, which is effective in existing works [14, 30, 59].