droid icon indicating copy to clipboard operation
droid copied to clipboard

Microsoft word form identified as XML

Open elbre opened this issue 4 years ago • 6 comments

Good day, during the testing of our files, we stepped upon the one interesting case. In the Microsoft Office is creatable the form which will be eventually filled. Such a document is saved as an ordinary .doc file. BUT! If such a file is going through Droid it is identified as XML. So eventually LTP system could try to open such a file as the XML which would be a quite fatal problem. I am not sure if this is fixable or actually a bug but I felt it is important to report it.

elbre avatar Mar 25 '20 09:03 elbre

Hi @elbre If you have a sample file that you used, and it does not have any sensitive information, would you be kind enough to attach that file to the issue as well. It will really help when someone looks at the issue in future. Many thanks

sparkhi avatar Mar 25 '20 09:03 sparkhi

unable to replicate with form templates on Office 2016. There is a Word XML format (via save as) that outputs a raw XML file, but also gives it an XML extension - this re-opens the file as a form in Word but not any other application (LibreOffice just shows the XML for instance).

This Word XML format looks like it would be pretty easy to assign a signature to (has a very clear tag '' immediately after the XML header) so we'll probably do that anyway, but I'm not sure if this is what's happened here without seeing a sample, or understanding the steps to reproduce...

Dclipsham avatar Mar 25 '20 14:03 Dclipsham

Hi again, I am sorry it took me a bit to get back with a sample. I needed to clear out the sensitive data and was assured that it is still appears as the xml file for Droid. text-doc-xml.zip

elbre avatar Mar 26 '20 08:03 elbre

Thank you. Can you describe the process for creating this file? Do you know which version of MS Office was used?

Dclipsham avatar Mar 26 '20 15:03 Dclipsham

I am not aware of the details about creating. It is the document from 2010 which should be archived. It was created at the Microsoft Word 97-2003.

elbre avatar Mar 27 '20 06:03 elbre

This Word XML format looks like it would be pretty easy to assign a signature to (has a very clear tag '' immediately after the XML header) so we'll probably do that anyway, but I'm not sure if this is what's happened here without seeing a sample, or understanding the steps to reproduce...

So something to handle in Pronom rather than in Droid @Dclipsham ?

jcharlet avatar Apr 07 '20 09:04 jcharlet