file-type
file-type copied to clipboard
.pdf files can be detected as .ai based on content
When pdf files have images created from photoshop or adobe ai in them, file-type detects them as .ai based on the byte checking heuristic we have in place.
I'm proposing that even if the magic string is found, if the original file's extension is .pdf, file-type should consider it a pdf and not change it's type based on some content inside of it.
An even more strict approach that I would also support is only returning ai file type if the file extension is already .ai. It seems more natural/compatible to default to .pdf if .ai isn't explicitly specified, since the ai detection is just a loose heuristic anyway.
@sindresorhus Curious if you have thoughts on this.
I plan to put up a fix with the second approach, but I would like to get https://github.com/sindresorhus/file-type/pulls in first
But it seems like we don't have access to the original file extension, since we only use the stream which makes sense, so maybe this approach is no good.
In my own usage, I'll work around it by managing this case in the caller.
Still, I wonder if there's a better way to do this than what we have today.
None of the file implemented recognition is perfect (guaranteed to be correct). By writing 4 characters at the beginning of a text file you can probably mimic half of of the file recognition heuristics. This reliability of the heuristics vary strongly.
If the recognition is likely to introduce false positives (for which there is no clear definition), it may indeed be better to, preferably improve the algorithm, or, like you suggest, fall back on it's parent file type.