unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Extracting Outlines by LayoutLMv3

Open tomanick opened this issue 4 months ago • 0 comments

Describe Model I am using LayoutLMV3:

I am working on a project that involves extracting outlines from PDF files. Some of the PDF files I am handling do not contain a structured "Contents" or "Outline" chapter at the beginning, making it challenging to extract the document’s structure through traditional methods.

I am considering using LayoutLMv3 to extract the document's outline by analyzing each page.

Could you please confirm whether LayoutLMv3 is capable of performing such a task? Additionally, I would appreciate it if you could provide guidance or example code on how to implement this. Furthermore, the input data would be PDF files or images?

Many thanks!

tomanick avatar Sep 26 '24 13:09 tomanick