uwazi
uwazi copied to clipboard
PDF to text is being executed on non-pdf files
Describe the bug We are seeing a few messages about attempts to extract text from non-PDF files:
pdftotext /tmp/1663821835115nhezigjjt9m.jpeg - failed with code 1
stderr output:
Syntax Warning: May not be a PDF file (continuing anyway)
To Reproduce It is unclear how to reproduce. Likely via CSV import with attached files?
Expected behavior Non-PDF files should not be allowed as main file for the moment.