formidable icon indicating copy to clipboard operation
formidable copied to clipboard

Formidable detects MIME-type according to the file extension and not by the real content

Open pubmikeb opened this issue 3 years ago • 18 comments

Support plan

  • which support plan is this issue covered by? (e.g. Community, Sponsor, or Enterprise): Community
  • is this issue currently blocking your project? (yes/no): no
  • is this issue affecting a production system? (yes/no): yes

Context

  • node version: 16.4.2
  • module (formidable) version: 3.0.0-canary.20210428
  • environment (e.g. node, browser, native, OS): Node.js
  • used with (i.e. popular names of modules):
  • any other relevant information: Formidable detects MIME-type according to the file extension and not by the real content. Which means that user can fake the file MIME by changing file's extension and as a result to upload to the server not allowed file types.

BTW, multer detects file's MIME not by the extension only.

What are you trying to achieve or the steps to reproduce?

  1. Given JPG file tst.jpg
  2. Rename it to tst.pdf
  3. Set a break point inside of uploader.parse(req, async (err, fields, files) => {…}
  4. Try to upload tst.pdf
uploader.parse(req, async (err, fields, files) => {
	if (err) {
		reject(err);
	} else {…}
});

What was the result you got?

mimetype = application/pdf

11_010801

What result did you expect?

mimetype = image/jpeg Since this file is actually JPG file but with a from extension.

pubmikeb avatar Jul 10 '21 23:07 pubmikeb

That is good point, and the readme could definitely have a small guide on how to check this.

I don't think we will ever bake in content-based MIME detection since there are thousands of file types (and more created each year) and each one needs special detection. For example to know if it is a jpg is very different algorithm than zip, pdf etc.

GrosSacASac avatar Jul 10 '21 23:07 GrosSacASac

since there are thousands of file types (and more created each year) and each one needs special detection

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

pubmikeb avatar Jul 11 '21 06:07 pubmikeb

multer detects file's MIME not by the extension only.

How does multer do it ?

GrosSacASac avatar Jul 13 '21 20:07 GrosSacASac

Sure, it's difficult to support all possible types. But a deep check for the very common types (e.g. office, PDF, media) could be a great feature with ability to extend it by custom plugins.

Common types is very subjective.

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example https://github.com/node-formidable/formidable/issues/718#issuecomment-873079196

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

GrosSacASac avatar Jul 13 '21 21:07 GrosSacASac

Personally I think we should embrace the npm philosophy of having a lot of modules that do 1 thing and do it well. 1 lib that validates pdf and jpg etc.

Then everyone can decide to import or not, if he needs alongside formidable.

Sure, ideally would be to make formidable-core for the minimalistic and generic as much as possible functionality and a set of plugins e.g. formidable-pdf, formidable-msoffice, etc. So every project could select the required set of supported types and every developer could enrich a set of supported file types by writing a plugin. Something similar to ESLint's set of rules.

pubmikeb avatar Jul 13 '21 21:07 pubmikeb

And we are trying to go to the opposite direction, make the lib smaller not bigger. For example #718 (comment)

Great approach, zero dependencies and 85 KB package are much better then alternative solutions with 2-5 MB of dependencies.

pubmikeb avatar Jul 13 '21 21:07 pubmikeb

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
	console.log(await FileType.fromFile("Unicorn.png"));
	console.log(await FileType.fromBuffer(fileBuffer));
	console.log(await FileType.fromStream(fileStream));

	//Output: {ext: "png", mime: "image/png'"}
})();

pubmikeb avatar Jul 13 '21 21:07 pubmikeb

How does multer do it ?

It looks like the file type is detected by checking the magic number of the buffer.

It worth using the file-type package:

import FileType from "file-type";

(async () => {
	console.log(await FileType.fromFile("Unicorn.png"));
	console.log(await FileType.fromBuffer(fileBuffer));
	console.log(await FileType.fromStream(fileStream));

	//Output: {ext: "png", mime: "image/png'"}
})();

Except file-type adds not a single dependency but twelve. Some of which are redundant at this point:

`-- [email protected]
  +-- [email protected]
  | `-- [email protected]
  |   +-- [email protected]
  |   +-- [email protected]
  |   | `-- [email protected]
  |   `-- [email protected]
  +-- [email protected]
  | +-- @tokenizer/[email protected]
  | `-- [email protected]
  `-- [email protected]
    +-- @tokenizer/[email protected] deduped
    `-- [email protected]

And formidable v3 already has plugin support so if you really want file type detection magic, just make a plugin for that. I just usually don't even bother. If people wanna upload jpeg as pdf, that's their fault.

TheThing avatar Jul 15 '21 08:07 TheThing

If people wanna upload jpeg as pdf, that's their fault.

OK, since such functionality is not must have and it takes to implement about 10 minutes, this issue can be closed.

pubmikeb avatar Jul 15 '21 16:07 pubmikeb

Leaning more and more towards a separate package like utils or plugins. where we can include such helpers for common things, just like we started with separate examples in the examples dir.

There's never ending discussion between small and big packages. And I'm still bouncing between the two. Okay, small, almost no dependencies package.. but end users anyway will include a lot of deps to do what their needs are.. so.. that's why i don't see big problem of including things like some of the two qs.. or other basic things like correct basic mime handling, here in the core package. Convenience and usability, over impracticality. Small packages / unix philosophy starts to break when you realize that you really need few things to have some meaningful and functional final product, with common features. I'm not saying I'm okey with the mentioned problem how much v2 started to grow.

I'm for staying thin and this to remain in userland with some guides, but it also might be a good thing to include some basic detection for most common types, and not something huge and very generic like file-type.

tunnckoCore avatar Jul 16 '21 13:07 tunnckoCore

I think there is no right or wrong answer here. I would take a decision in a more sophisticated way, based not solely on amount of dependencies, but on a combination of the following parameters:

  • How wanted/popular a feature, a dependency brings, is If it is requested by many or is a common and implemented in many scenarios, so why not to include it out-of-box?
  • Popularity of the dependency It should have at least 1M weekly downloads.
  • Maintenance status of the dependency It is actively maintained and the last release has been published at most 12M ago.
  • Dependency size (incl. all its sub-dependencies) Don't see any issue to include a dependency with a total (incl. all its sub-dependencies) size up to 1MB if it meets at least 3 of 4 requirements in this list.

@tunnckoCore, should I reopen this issue?

pubmikeb avatar Jul 16 '21 17:07 pubmikeb

Agree.

should I reopen this issue?

I don't know. I'm not very active last half year or more. Maybe we can open it if we decide to implement some basic detection.

tunnckoCore avatar Jul 16 '21 21:07 tunnckoCore

If nobody wants to work on this we might as well close the issue

GrosSacASac avatar Sep 29 '21 11:09 GrosSacASac

I'll work on it in some time.

Made some research:

  • file-type is too much and complex.
  • mimer is kind of okay, almost, except that instead of having dependency has db inside of it, which is huge size too.
  • magic-bytes.js is 192kb but its code architecture seems good and could be stripped down and replicated
    • i can suggest some updates to him.
    • it adds all the types here (12kb) like add('jpg', magicnumbers); add('pdf', pdfMagicNumbers);
    • it also adds the tests in the npm, so..
    • otherwise magic-bytes is free package name btw :laughing:

Can make @formidable/helper-mimetypes or @formidable/mimetype or formidable-mimetype, plugin or helper that uses magic-bytes or magic-bytes.js

tunnckoCore avatar Oct 30 '21 16:10 tunnckoCore

https://github.com/vader-sama/typective similar to magic-bytes but works with streams

Also remember that a valid number does not prove anything at the end, I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

GrosSacASac avatar Mar 04 '22 13:03 GrosSacASac

I remember it was possible to make valid jpeg files that were also valid php files, which was one way to hack a server that looked like it did everything correct.

Oh yea.. :laughing: I remember that too.

typective

yea, seems good.

tunnckoCore avatar Mar 17 '22 03:03 tunnckoCore

Another one: https://github.com/lukeed/mrmime

tunnckoCore avatar Jun 08 '22 20:06 tunnckoCore

I'm happy user of mmmagic. It was chosen ~6 years ago for execution inside aws lambda because of speed. But for formidable I would suggest to make it plugin-able. I mean not a specific plugin, but just a parameter like:

const form = formidable({
  getMimeType(file: Buffer): string {
    // custom logic
  }
})

Or plugin which allows custom logic.

Bessonov avatar Dec 06 '22 23:12 Bessonov