office2john - assertion error in find_table on very old word document
I have found very old Microsoft Word documents protected by pass (old letters). They are from years 1998-2000, so I am expecting Microsoft Office 97 format.
When I try:
python3 office2john.py letter.doc
the result is
Traceback (most recent call last):
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 3151, in <module>
ret = process_file(sys.argv[i])
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 3081, in process_file
stream = find_table(filename, sdoc)
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 2461, in find_table
assert(w_ident == b"\xec\xa5")
AssertionError
Looks like this line https://github.com/openwall/john/blob/342364206fa9bfc129c6ef78fd2770f91e137b74/run/office2john.py#L2461 is asserting some header, but it is not there. Unfortunately, those letters are personal, so I cannot post whole file simply here.
xxd -l 208 letter.doc
00000000: d0cf 11e0 a1b1 1ae1 0000 0000 0000 0000 ................
00000010: 0000 0000 0000 0000 3e00 0300 feff 0900 ........>.......
00000020: 0600 0000 0000 0000 0000 0000 0100 0000 ................
00000030: 1f00 0000 0000 0000 0010 0000 2000 0000 ............ ...
00000040: 0100 0000 feff ffff 0000 0000 1e00 0000 ................
00000050: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000060: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000070: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000080: ffff ffff ffff ffff ffff ffff ffff ffff ................
00000090: ffff ffff ffff ffff ffff ffff ffff ffff ................
000000a0: ffff ffff ffff ffff ffff ffff ffff ffff ................
000000b0: ffff ffff ffff ffff ffff ffff ffff ffff ................
000000c0: ffff ffff ffff ffff ffff ffff ffff ffff ................
I was looking if is something is missing (some library) or if office2john is not supporting so old office files, but no luck.
The assertion was added at: https://github.com/openwall/john/commit/8e26589a0e9610dcdc3cf429d279bbaaefb378b0 and edited at: https://github.com/openwall/john/commit/2aadec2872dd88c9e5b71af87bafd8356e40a464
Have you tried commenting out the assertion to see if it works? Could you try to use the office2john from an old commit, e.g., https://github.com/openwall/john/commit/22dfdbc0e6c3fa8a9f0d35ec4ce670b972c45414, to see what happens ?
Commenting current code (line 2461):
python3 office2john.py letter.doc
Traceback (most recent call last):
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 3092, in process_file
workbookStream = ole.openstream(stream)
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 1907, in openstream
sid = self._find(filename)
File "/tmp/john-bleeding-jumbo/run/office2john.py", line 1887, in _find
raise IOError("file not found")
OSError: file not found
letter.doc : stream 0Table not found!
Using oldoffice2john.py from https://github.com/openwall/john/commit/22dfdbc0e6c3fa8a9f0d35ec4ce670b972c45414:
python oldoffice2john.py letter.doc
Traceback (most recent call last):
File "oldoffice2john.py", line 1559, in <module>
process_file(sys.argv[i])
File "oldoffice2john.py", line 1524, in process_file
workbookStream = ole.openstream(stream)
File "oldoffice2john.py", line 1313, in openstream
sid = self._find(filename)
File "oldoffice2john.py", line 1296, in _find
raise IOError, "file not found"
IOError: file not found
Looks that both are raising error at the same place.
- new: https://github.com/openwall/john/blob/342364206fa9bfc129c6ef78fd2770f91e137b74/run/office2john.py#L1887
- old: https://github.com/openwall/john/blob/22dfdbc0e6c3fa8a9f0d35ec4ce670b972c45414/run/oldoffice2john.py#L1296
Looks very strange to me.
- I am sure, those files are Word files (I am an author)
- The pass is gone (I cannot remember)
My first guess would be it's even older than Office 97 - but I just tested with a sample file and office2john points it out rather than failing:
$ python3 ../run/office2john.py Test-weak-XOR_myhovercraftisfullofeels_.doc
Test-weak-XOR_myhovercraftisfullofeels_.doc : XOR obfuscation detected, Password Verifier : b'05f7beca'
The above file is "Composite Document File V2 Document" according to file(1). Perhaps your files are even older than that?
What's the output of file letter.doc?
file letter.doc
letter.doc: CDFV2 Microsoft Word
Their dates are between 1998-2000. But yes, the format can be older than Office 97, maybe MS Word 95 which is probably using old format MS Word 6, incompatible with MS Word 97.
I have some unprotected files from these ages too.
file letter2.doc
letter2.doc: Composite Document File V2 Document, Little Endian, Os: Windows, Version 4.0, Code page: 1250, Template: Normal, Revision Number: 4, Name of Creating Application: Microsoft Word for Windows 95, Last Printed: Sun Oct 17 21:08:00 1999, Create Time/Date: Mon Nov 1 19:55:00 1999, Last Saved Time/Date: Mon Nov 1 19:55:00 1999, Number of Pages: 2, Number of Words: 18, Number of Characters: 106, Security: 0
According to https://www.loc.gov/preservation/digital/formats/fdd/fdd000509.shtml (see the xxd outuput at https://github.com/openwall/john/issues/5229#issue-1558373004):
- the DOC file looks like a valid file with CFB version 3.3e.
The internal identifier for Word binary file is missing.
Perhaps @javorekm can find more by looking at the full output of xxd, but it seems the file (version) is simply not supported by JtR.
assert(w_ident == b"\xec\xa5")
BTW, I think we shouldn't use assert other than for statements that are expected to always be true regardless of program input. That is, any assertion errors should indicate our bugs, and nothing else. For unsupported or wrong input, we should use e.g. if and printing of user-friendly messages. Can we at least fix that? OTOH, I guess these assert statements in this script came from third-party code, so maybe we should also look into sync'ing with the upstream project this came from and maybe contributing such changes to there.