papermage How to improve detection of sections?

How to improve detection of sections?

Open ldt opened this issue 1 year ago • 1 comments

Hi,

Congrats for your great work and beautiful API!

I'm especially interested in using it to create a hierarchical document based on the original PDF. My issue is that some sections are not correctly identified.

For example in your papermage.pdf file, the 2nd section is mixed with the 2.1 section:

And the title of the 3.3 section is partially identified:

I have similar issues on some of my documents.

I would like to know how it could be improved. Could it be more trained if there was a training set of documents with the correct sections that were pre-identified?

Let me know how I could help, the topic is really interesting!

Dec 26 '23 16:12 ldt

I have the same problem! Hope the authors can sovle this!

Mar 24 '24 12:03 MpLebron

papermage papermage copied to clipboard

How to improve detection of sections?

papermage
papermage copied to clipboard