Docx file created by LibreOffice returns Zip mime type
I think its because LibreOffice (OpenOffice) put files into docx-archive in different order than MS Word. So file have another signature and detects as simple zip-archive.
Example file is attached. DocxByLibreOffice.docx
That's correct because it is a zip file. For me to detect it as something else means that I have to open it up and process the file contents which I'm less interested in doing. I'm specifically worried about processing a large zip file just to see if it is a doc file. But maybe I can read in the first X bytes of the zip file and look for key files....
@j256 Did you add something to detect this case as a DOCX file? I found similar problem and I tried several ways to detect it as a DOCX file but just "Tika core" library was able to detect this case correctly.
Tika tika = new Tika();
String mimeType = tika.detect(filePath);
// output mime type: application/vnd.openxmlformats-officedocument.wordprocessingml.document