papermage icon indicating copy to clipboard operation
papermage copied to clipboard

How to improve detection of sections?

Open ldt opened this issue 6 months ago • 1 comments

Hi,

Congrats for your great work and beautiful API!

I'm especially interested in using it to create a hierarchical document based on the original PDF. My issue is that some sections are not correctly identified.

For example in your papermage.pdf file, the 2nd section is mixed with the 2.1 section: image

And the title of the 3.3 section is partially identified: image

I have similar issues on some of my documents.

I would like to know how it could be improved. Could it be more trained if there was a training set of documents with the correct sections that were pre-identified?

Let me know how I could help, the topic is really interesting!

ldt avatar Dec 26 '23 16:12 ldt