uwazi PDF to text is being executed on non-pdf files

PDF to text is being executed on non-pdf files

Open txau opened this issue 2 years ago • 0 comments

Describe the bug We are seeing a few messages about attempts to extract text from non-PDF files:

pdftotext /tmp/1663821835115nhezigjjt9m.jpeg - failed with code 1
stderr output:
Syntax Warning: May not be a PDF file (continuing anyway)

To Reproduce It is unclear how to reproduce. Likely via CSV import with attached files?

Expected behavior Non-PDF files should not be allowed as main file for the moment.

Sep 22 '22 14:09 txau