simplemagic icon indicating copy to clipboard operation
simplemagic copied to clipboard

Doesn't recognize bitmap files exported from GIMP

Open Gurfuzle opened this issue 7 years ago • 5 comments

When I'm exporting images from GIMP as bitmap, this is not recognizing the magic number for those. When I run the file through xxd, I am getting:

00000000: 424d 7a75 0200 0000 0000 7a04 0000 6c00 BMzu......z...l. 00000010: 0000 9001 0000 9001 0000 0100 0800 0000 ................ 00000020: 0000 0071 0200 232e 0000 232e 0000 0001 ...q..#...#..... 00000030: 0000 0001 0000 4247 5273 0000 0000 0000 ......BGRs......

Which does start with the 424d, but it fails to be recognized as a bitmap.

Gurfuzle avatar Nov 14 '18 16:11 Gurfuzle

Here's an example file (zipped) example.bmp.zip

Gurfuzle avatar Nov 14 '18 17:11 Gurfuzle

Great example Mike. Thanks much.

j256 avatar Nov 14 '18 20:11 j256

I've actually stumbled upon this myself and investigated a bit. The problem lies in MagicEntries.optimizeFirstBytes(), where it calls MagicEntry.getStartsWithByte() -> StringType.getStartingBytes() ->StringType$TestInfo.getStartingBytes(). This will always return null if the string is less than 4 characters long. Which means all file types that start with a string pattern of magic bytes that is less than 4 characters long will not end up in the optimization index and are never actually considered during subsequent matching attempts. Since the Bitmap format only starts with two fixed characters BM as its starting string, it also falls victim to this rule. Actually, the calling code only ever uses the first byte anyway, so requiring more than that seems unnecessary.

CrushaKRool avatar Oct 19 '20 14:10 CrushaKRool

Appreciate the look @CrushaKRool . The code is supposed to use the first-byte stuff and then fall through to the findMatch(). See https://github.com/j256/simplemagic/blob/211cf35f7a827958e78aba0c15ec4c8dcfe0699a/src/main/java/com/j256/simplemagic/entries/MagicEntries.java#L122

Let me get this test in place and then debug it.

j256 avatar Oct 19 '20 23:10 j256

Ah, you are right. I overlooked that.

Debugging it further, it seems to identify the first magic bytes as Bitmap but fails to match any of the child formats, which require the byte at index 14 to be either 12, 40, 64 or 128. In my case it's 124, though (exported from GIMP). Unfortunately, since the name of the parent MagicEntry for bitmap is "unknown" and none of the children overwrite this with something else, it will end up as "unknown" in the ContentData and also not set any mime types. And the method is coded to return null as ContentInfo in that case.

https://github.com/j256/simplemagic/blob/074a1fd5b13dc614ba9bffa7702232fdd6130231/src/main/java/com/j256/simplemagic/entries/MagicEntry.java#L64-L67

So I guess it boils down to both the Magic file not providing enough data to handle the base case without a proper child match, as well as GIMP producing a header of an unknown format. According to the documentation on Wikipedia, the byte on the 0-based index 14 is the start of the DIB header and tells the size of that header in bytes. So perhaps GIMP is producing some kind of header that is only 124 bytes in size, rather than the four other sizes of the PC bitmap formats defined in the Magic file.

CrushaKRool avatar Oct 21 '20 15:10 CrushaKRool