Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Clarification on the Nougat Transformers

Open Dipankar1997161 opened this issue 1 year ago • 4 comments

@NielsRogge, thanks for the tutorials, I am particularly interested in the Nougat one and have a question.

Nougat can provide the text extraction from the pdfs, I was wondering, can it extract Tables/structured data and Images too? Have you tried this. By images, I don't mean pdfs as images but rather the images within the pdfs.

My end goal is to store the extracted data separately into 3 sections, 1. Text 2. Images 3. Tables Would love to hear on this matter from you.

Dipankar1997161 avatar Oct 17 '23 08:10 Dipankar1997161

Yes, you can train a Nougat/Donut model which takes in images of tables and generates the corresponding content in key-value pairs. You just need a high quality dataset of (table image, table content) pairs.

NielsRogge avatar Oct 17 '23 11:10 NielsRogge

Yes, you can train a Nougat/Donut model which takes in images of tables and generates the corresponding content in key-value pairs. You just need a high quality dataset of (table image, table content) pairs.

I mean, Nougat was specifically designed for academic papers right? So it should have been altrady tained to extract "Structured Data" since any research paper will contain Tables and text together??

Dipankar1997161 avatar Oct 17 '23 15:10 Dipankar1997161

Yes you could fine-tune Nougat on additional data, you could benefit from Nougat's pre-training.

NielsRogge avatar Oct 18 '23 07:10 NielsRogge

@NielsRogge does it work for docVQA task?

AbdulDD avatar Mar 28 '24 10:03 AbdulDD