grobid
grobid copied to clipboard
Introducing grobidmonkey: A Python Package for grobid output Parsing
Last year, I reached out to the community seeking a Python solution for extracting and parsing content from Grobid's TEI-XML output. Under the original issue, I noticed other users expressing the same need. Faced with these challenges, I've taken the initiative to develop a Python package named grobidmonkey
to address this issue.
While it's still in its early versions, I believe grobidmonkey can be a valuable tool for the community. I'm eager to hear your thoughts and feedback to make it better.
GitHub Repository: grobidmonkey
The package is currently only available through pip and can be installed with
pip install grobidmonkey
to use it you can run
from grobidmonkey import reader
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'
# read paper outline
outline = monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')
# read paper content
essay = monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')
@com3dian thanks for your contribution. I did not yet the opportunity to test it. As soon as I do I will surely write you my feedback.