tree_magic icon indicating copy to clipboard operation
tree_magic copied to clipboard

tar containing pdf is detected as pdf

Open phiresky opened this issue 6 years ago • 2 comments

I know this library is unmaintained, but opening this for a future maintainer :)

this file: test.tar.zip (zipped to prevent github complaining)

file --mime-type

application/x-tar

tmagic:

application/pdf

phiresky avatar Jun 16 '19 09:06 phiresky

@phiresky This is intentional. Quote from README:

Unlike the typical approach that libmagic and file(1) uses, this loads all the file types in a tree based on subclasses. (EX: application/vnd.openxmlformats-officedocument.wordprocessingml.document (MS Office 2007) subclasses application/zip which subclasses application/octet-stream) Then, instead of checking the file against every file type, it can traverse down the tree and only check the file types that make sense to check. (After all, the fastest check is the check that never gets run.)

hongquan avatar Aug 23 '21 17:08 hongquan

tbh I don't see how that explains misdetection? Why does traversing a tree explain a wrong detection? Even if the answer is ambiguous, I don't see why it can't either output the more likely one or all possibilities.

The readme also says tree_magic is designed to be more efficient and to have less false positives compared to the old approach used by libmagic

phiresky avatar Aug 23 '21 18:08 phiresky