papermage
papermage copied to clipboard
How to improve detection of sections?
Hi,
Congrats for your great work and beautiful API!
I'm especially interested in using it to create a hierarchical document based on the original PDF. My issue is that some sections are not correctly identified.
For example in your papermage.pdf file, the 2nd section is mixed with the 2.1 section:
And the title of the 3.3 section is partially identified:
I have similar issues on some of my documents.
I would like to know how it could be improved. Could it be more trained if there was a training set of documents with the correct sections that were pre-identified?
Let me know how I could help, the topic is really interesting!
I have the same problem! Hope the authors can sovle this!