grobid icon indicating copy to clipboard operation
grobid copied to clipboard

Introducing grobidmonkey: A Python Package for grobid output Parsing

Open com3dian opened this issue 10 months ago • 1 comments

Last year, I reached out to the community seeking a Python solution for extracting and parsing content from Grobid's TEI-XML output. Under the original issue, I noticed other users expressing the same need. Faced with these challenges, I've taken the initiative to develop a Python package named grobidmonkey to address this issue.

While it's still in its early versions, I believe grobidmonkey can be a valuable tool for the community. I'm eager to hear your thoughts and feedback to make it better.

GitHub Repository: grobidmonkey

The package is currently only available through pip and can be installed with

pip install grobidmonkey

to use it you can run

from grobidmonkey import reader
monkeyReader = reader.MonkeyReader('monkey') # or 'lxml' or 'x2d'

# read paper outline
outline = monkeyReader.readOutline('/path/to/your/paper.pdf.tei.xml')

# read paper content
essay = monkeyReader.readEssay('/path/to/your/paper.pdf.tei.xml')

com3dian avatar Apr 14 '24 22:04 com3dian

@com3dian thanks for your contribution. I did not yet the opportunity to test it. As soon as I do I will surely write you my feedback.

lfoppiano avatar May 14 '24 03:05 lfoppiano