file-type .pdf files can be detected as .ai based on content

.pdf files can be detected as .ai based on content

Open eric-yuan-vanta opened this issue 2 years ago • 3 comments

When pdf files have images created from photoshop or adobe ai in them, file-type detects them as .ai based on the byte checking heuristic we have in place.

I'm proposing that even if the magic string is found, if the original file's extension is .pdf, file-type should consider it a pdf and not change it's type based on some content inside of it.

An even more strict approach that I would also support is only returning ai file type if the file extension is already .ai. It seems more natural/compatible to default to .pdf if .ai isn't explicitly specified, since the ai detection is just a loose heuristic anyway.

Feb 15 '23 21:02 eric-yuan-vanta

@sindresorhus Curious if you have thoughts on this.

I plan to put up a fix with the second approach, but I would like to get https://github.com/sindresorhus/file-type/pulls in first

Feb 15 '23 21:02 eric-yuan-vanta

But it seems like we don't have access to the original file extension, since we only use the stream which makes sense, so maybe this approach is no good.

In my own usage, I'll work around it by managing this case in the caller.

Still, I wonder if there's a better way to do this than what we have today.

Feb 15 '23 21:02 eric-yuan-vanta

None of the file implemented recognition is perfect (guaranteed to be correct). By writing 4 characters at the beginning of a text file you can probably mimic half of of the file recognition heuristics. This reliability of the heuristics vary strongly.

If the recognition is likely to introduce false positives (for which there is no clear definition), it may indeed be better to, preferably improve the algorithm, or, like you suggest, fall back on it's parent file type.

Feb 17 '23 07:02 Borewit

file-type file-type copied to clipboard

.pdf files can be detected as .ai based on content

file-type
file-type copied to clipboard