analyzeMFT icon indicating copy to clipboard operation
analyzeMFT copied to clipboard

Filename bug

Open Pballen opened this issue 10 years ago • 12 comments

The mft has problems dealing w/ filenames with periods in them. For example, "adobe/reader 9.0" is reported as "adobe/reader~1.0" and "9.3.0" becomes "932E79D~1.0". I tried looking the the unicode hack, but couldn't come up w/ any obvious solutions.

Pballen avatar Aug 05 '13 15:08 Pballen

I updated the filename processing last week or the week before. Are you running the latest code?

dkovar avatar Aug 05 '13 15:08 dkovar

I got the newer version and ran it. I'm getting some strange results. I think the periods are screwing things up?

"Documents and Settings\Admin\Application Data\Sun\Java\jre1.6.0_05" becomes "path\JRE16~1.0_0" "WINDOWS\assembly\NativeImages_v2.0.50727_32\System.ServiceModel" becomes "path\System~4.SER" "above path\System.ServiceModel#" becomes "path\Sy1587~1.SER"

Pballen avatar Aug 05 '13 21:08 Pballen

Curiouser and curiouser

"Documents and Settings\All Users\Documents\My Music" becomes "path\MYMUSI~1" (which happens to everything in the Music folder). However, "Documents and Settings\All Users\Documents\My Pictures" comes out fine.

Pballen avatar Aug 05 '13 21:08 Pballen

That looks like 8.3 naming:

http://support.microsoft.com/kb/142982

Have you looked at the raw MFT record?

dkovar avatar Aug 05 '13 21:08 dkovar

Something to consider: an MFT record may have multiple $FN attributes, and it looks like analyzeMFT always picks the first encountered as the filename (https://github.com/dkovar/analyzeMFT/blob/master/analyzemft/mft.py#L331). I've found that the ordering of filename attributes is not consistent, and probably shouldn't be relied upon.

The namespace field at offset 0x41 describes which type of filename data a $FN attribute contains. http://lxr.linux.no/linux+v3.8.6/fs/ntfs/layout.h#L1012 lists the possible values, and personally I prioritize 0x1 (FILE_NAME_WIN32 ) and 0x3 (FILE_NAME_WIN32_AND_DOS) since they're "full" filenames. Perhaps considering these fields, and ordering the attributes in the record structure will ensure the most appropriate filename gets printed.

williballenthin avatar Aug 06 '13 12:08 williballenthin

Willi,

Superb information, thanks. That should be an easy fix. Shall get this done this week, hopefully.

-David

dkovar avatar Aug 06 '13 13:08 dkovar

Fix added. Please test it and let me know. (And thank you for finding and reporting these bugs!)

dkovar avatar Aug 07 '13 16:08 dkovar

We've gone from lots of bad filenames to only a handful, all limited to a single misread? character.

"Documents and Settings/User/My Documents/My Pictures/Guatemala/IMG_0715.jpeg" becomes "path/IMP_0175.JE-G" (the E has an accent). Similarly, "path/IMG_0196.jpeg" becomes "IMP_0196.JIuG" (the I has an accent, and the u is a microsign). Strangely, all of the other jpegs in the folder seem to come out fine.

"System Vol Info/restore{...}/RP256/40029775.old" becomes "path/40029775.03d".

Pballen avatar Aug 08 '13 14:08 Pballen

If you look at the raw record, is one of the other FN records more accurate? I just grabbed the first "full" name. It may be that I need to prioritize one over the other.

dkovar avatar Aug 08 '13 14:08 dkovar

The raw record only has a single FN record, but its reading name as "IMG_0175.J\xc8\x96G" Let me try getting the raw bits. My MFT might be bad?

Pballen avatar Aug 08 '13 15:08 Pballen

The $MFT might be good and the actual filename is the culprit.

dkovar avatar Aug 08 '13 15:08 dkovar

I opened the MFT in a hex viewer. The relevant bits (which do convert to IMG_0175.jpg) are: 49 00 4D 00 47 00 5F 00 30 00 31 00 37 00 36 00 2E 00 4A 00 50 00 47 00

I then added the line s = "".join(["%02X|"%ord(x) for x in bytes]) in the relevant place in decodeFNAttribute. The relevant bits are: 49 00 4D 00 47 00 5F 00 30 00 31 00 37 00 35 00 2E 00 4A 00 16 02 47 00

I think something strange is happening w/ reading in the raw_record. Not sure what.

Pballen avatar Aug 08 '13 19:08 Pballen