formidable
formidable copied to clipboard
Formidable detects MIME-type according to the file extension and not by the real content
Support plan
- which support plan is this issue covered by? (e.g. Community, Sponsor, or Enterprise): Community
- is this issue currently blocking your project? (yes/no): no
- is this issue affecting a production system? (yes/no): yes
Context
- node version: 16.4.2
- module (formidable) version: 3.0.0-canary.20210428
- environment (e.g. node, browser, native, OS): Node.js
- used with (i.e. popular names of modules):
- any other relevant information: Formidable detects MIME-type according to the file extension and not by the real content. Which means that user can fake the file MIME by changing file's extension and as a result to upload to the server not allowed file types.
BTW, multer
detects file's MIME not by the extension only.
What are you trying to achieve or the steps to reproduce?
- Given JPG file
tst.jpg
- Rename it to
tst.pdf
- Set a break point inside of
uploader.parse(req, async (err, fields, files) => {…}
- Try to upload
tst.pdf
uploader.parse(req, async (err, fields, files) => {
if (err) {
reject(err);
} else {…}
});
What was the result you got?
mimetype = application/pdf
What result did you expect?
mimetype = image/jpeg
Since this file is actually JPG file but with a from extension.
That is good point, and the readme could definitely have a small guide on how to check this.
I don't think we will ever bake in content-based MIME detection since there are thousands of file types (and more created each year) and each one needs special detection. For example to know if it is a jpg is very different algorithm than zip, pdf etc.
since there are thousands of file types (and more created each year) and each one needs special detection
Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.
multer detects file's MIME not by the extension only.
How does multer do it ?
Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.
Common types is very subjective.
And we are trying to go to the opposite direction, make the lib smaller not bigger. For example https://github.com/node-formidable/formidable/issues/718#issuecomment-873079196
Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.
Then everyone can decide to import or not, if he needs alongside formidable.
Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.
Then everyone can decide to import or not, if he needs alongside formidable.
Sure, ideally would be to make formidable-core
for the minimalistic and generic as much as possible functionality and a set of plugins e.g. formidable-pdf
, formidable-msoffice
, etc. So every project could select the required set of supported types and every developer could enrich a set of supported file types by writing a plugin. Something similar to ESLint's set of rules.
And we are trying to go to the opposite direction, make the lib smaller not bigger. For example #718 (comment)
Great approach, zero dependencies and 85 KB package are much better then alternative solutions with 2-5 MB of dependencies.
How does
multer
do it ?
It looks like the file type is detected by checking the magic number of the buffer.
It worth using the file-type
package:
import FileType from "file-type";
(async () => {
console.log(await FileType.fromFile("Unicorn.png"));
console.log(await FileType.fromBuffer(fileBuffer));
console.log(await FileType.fromStream(fileStream));
//Output: {ext: "png", mime: "image/png'"}
})();
How does
multer
do it ?It looks like the file type is detected by checking the magic number of the buffer.
It worth using the
file-type
package:import FileType from "file-type"; (async () => { console.log(await FileType.fromFile("Unicorn.png")); console.log(await FileType.fromBuffer(fileBuffer)); console.log(await FileType.fromStream(fileStream)); //Output: {ext: "png", mime: "image/png'"} })();
Except file-type adds not a single dependency but twelve. Some of which are redundant at this point:
`-- [email protected]
+-- [email protected]
| `-- [email protected]
| +-- [email protected]
| +-- [email protected]
| | `-- [email protected]
| `-- [email protected]
+-- [email protected]
| +-- @tokenizer/[email protected]
| `-- [email protected]
`-- [email protected]
+-- @tokenizer/[email protected] deduped
`-- [email protected]
And formidable v3 already has plugin support so if you really want file type detection magic, just make a plugin for that. I just usually don't even bother. If people wanna upload jpeg as pdf, that's their fault.
If people wanna upload jpeg as pdf, that's their fault.
OK, since such functionality is not must have and it takes to implement about 10 minutes, this issue can be closed.
Leaning more and more towards a separate package like utils
or plugins. where we can include such helpers for common things, just like we started with separate examples in the examples dir.
There's never ending discussion between small and big packages. And I'm still bouncing between the two. Okay, small, almost no dependencies package.. but end users anyway will include a lot of deps to do what their needs are.. so.. that's why i don't see big problem of including things like some of the two qs.. or other basic things like correct basic mime handling, here in the core package. Convenience and usability, over impracticality. Small packages / unix philosophy starts to break when you realize that you really need few things to have some meaningful and functional final product, with common features. I'm not saying I'm okey with the mentioned problem how much v2 started to grow.
I'm for staying thin and this to remain in userland with some guides, but it also might be a good thing to include some basic detection for most common types, and not something huge and very generic like file-type.
I think there is no right or wrong answer here. I would take a decision in a more sophisticated way, based not solely on amount of dependencies, but on a combination of the following parameters:
- How wanted/popular a feature, a dependency brings, is If it is requested by many or is a common and implemented in many scenarios, so why not to include it out-of-box?
- Popularity of the dependency It should have at least 1M weekly downloads.
- Maintenance status of the dependency It is actively maintained and the last release has been published at most 12M ago.
- Dependency size (incl. all its sub-dependencies) Don't see any issue to include a dependency with a total (incl. all its sub-dependencies) size up to 1MB if it meets at least 3 of 4 requirements in this list.
@tunnckoCore, should I reopen this issue?
Agree.
should I reopen this issue?
I don't know. I'm not very active last half year or more. Maybe we can open it if we decide to implement some basic detection.
If nobody wants to work on this we might as well close the issue
I'll work on it in some time.
Made some research:
-
file-type
is too much and complex. -
mimer
is kind of okay, almost, except that instead of having dependency has db inside of it, which is huge size too. -
magic-bytes.js
is 192kb but its code architecture seems good and could be stripped down and replicated- i can suggest some updates to him.
- it adds all the types here (12kb) like
add('jpg', magicnumbers); add('pdf', pdfMagicNumbers);
- it also adds the tests in the npm, so..
- otherwise
magic-bytes
is free package name btw :laughing:
Can make @formidable/helper-mimetypes
or @formidable/mimetype
or formidable-mimetype
, plugin or helper that uses magic-bytes
or magic-bytes.js
https://github.com/vader-sama/typective similar to magic-bytes but works with streams
Also remember that a valid number does not prove anything at the end, I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.
I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.
Oh yea.. :laughing: I remember that too.
typective
yea, seems good.
Another one: https://github.com/lukeed/mrmime
I'm happy user of mmmagic. It was chosen ~6 years ago for execution inside aws lambda because of speed. But for formidable I would suggest to make it plugin-able. I mean not a specific plugin, but just a parameter like:
const form = formidable({
getMimeType(file: Buffer): string {
// custom logic
}
})
Or plugin which allows custom logic.