Search
Search copied to clipboard
Alternatives to GROBID (PDF parsing)
Are there any alternatives to GROBID and would there be any major advantages in using them?
Alternatives (feel free to add new entries)
- https://github.com/pdfminer/pdfminer.six
- https://github.com/mstamy2/PyPDF2
- https://github.com/pymupdf/PyMuPDF
Other links
Comments
If we go for a pure Python solution there might not be need for intermediary formats (i.e. TEI XML for GROBID)