puremagic icon indicating copy to clipboard operation
puremagic copied to clipboard

Could not identify `.msg`

Open SimeonStoykovQC opened this issue 2 months ago • 1 comments

In addition to #118, supporting .msg would be great too.

libmagic recognizes it as application/vnd.ms-outlook.

Downloading an email from Outlook produces a .msg.

SimeonStoykovQC avatar Nov 04 '25 08:11 SimeonStoykovQC

I had initially put the wrong content type output from libmagic; it should be application/vnd.ms-outlook.

SimeonStoykovQC avatar Nov 05 '25 11:11 SimeonStoykovQC

Having a quick look at these it would appear to be based on the Compound Binary Format: https://www.loc.gov/preservation/digital/formats/fdd/fdd000379.shtml https://www.loc.gov/preservation/digital/formats/fdd/fdd000380.shtml

Much like the older Word, Excel , Powerpoint and Works formats the outer CBF is a wrapper with the Outlook .msg part residing within. I looked briefly at this when working on #99 (See the Works entry). I think we could make a magic bytes match, but really this whole CBF thing is deserving of its own scanner.

NebularNerd avatar Dec 15 '25 18:12 NebularNerd

I've added a rudimentary magic_json entry for .msg. As long as the extension is correct it will win the confidence, if not it will be lost within the noise of the other CBF matches (all files start with d0cf11e0a1b11ae1). Its name will appear as Outlook 97-2003 Item File as it can contain mail, appointments and more.

Without a dedicated scanner for these CBF files a 99%+ match will not be possible as the CLSID identifiers are not in a fixed location.

EDIT 17/12: CFBF (Compound File Binary Format) scanner in progress 😁

'M:\Downloads\example.msg' : .msg
Total Possible Matches: 1

        Deepscan Match
        Name: Outlook Item File [Type:Email CFBF:v3]
        Confidence: 100%
        Extension: .msg
        Mime Type: application/vnd.ms-outlook
        Byte Match: b'\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1'
        Offset: 0

NebularNerd avatar Dec 15 '25 20:12 NebularNerd