simplemagic icon indicating copy to clipboard operation
simplemagic copied to clipboard

Docx file created by LibreOffice returns Zip mime type

Open RangerMak opened this issue 8 years ago • 2 comments

I think its because LibreOffice (OpenOffice) put files into docx-archive in different order than MS Word. So file have another signature and detects as simple zip-archive.

Example file is attached. DocxByLibreOffice.docx

RangerMak avatar Nov 23 '17 07:11 RangerMak

That's correct because it is a zip file. For me to detect it as something else means that I have to open it up and process the file contents which I'm less interested in doing. I'm specifically worried about processing a large zip file just to see if it is a doc file. But maybe I can read in the first X bytes of the zip file and look for key files....

j256 avatar Dec 06 '17 17:12 j256

@j256 Did you add something to detect this case as a DOCX file? I found similar problem and I tried several ways to detect it as a DOCX file but just "Tika core" library was able to detect this case correctly.

Tika tika = new Tika();
String mimeType = tika.detect(filePath);
// output mime type: application/vnd.openxmlformats-officedocument.wordprocessingml.document

fjtorres avatar Sep 22 '22 08:09 fjtorres