poster
poster copied to clipboard
PDF composed of images of scans causes grobid to fail (add error handling)
This looks like a grobid issue:
MultiXml::ParseError (1:1: FATAL: Start tag expected, '<' not found):
This is because of https://github.com/kermitt2/grobid/issues/132, the PDF file is composed of images of scans and has no text to parse, so grobid fails.
for all PDFs that are images of scans, we could create an error message and say we do not yet support parsing text from images of scans
https://linear.app/issue/JEL-36/grobid-fails-when-pdf-is-composed-of-images