multer icon indicating copy to clipboard operation
multer copied to clipboard

Issue with UTF-8 characters in filename

Open CleyFaye opened this issue 2 years ago • 20 comments

Hi,

I found recently that something changed regarding the handling of filename containing utf-8 characters; they seem to be passed as-is, which was not the case before.

After investigating a bit I could reproduce the issue with the minimal code in https://github.com/CleyFaye/test-multer

I found that the browser side just pass the name as-is in the "filename" part of the header. I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this, and RFC7578 actually says it should not be used.

What would be the proper way to handle this? Obviously it is possible, server side, to convert the content of originalname by putting all characters as bytes in an array then interpreting it as an utf-8 string (it does work), but since I never had this issue with older versions, I suspect something changed in the way multer handles this.

CleyFaye avatar Jun 09 '22 12:06 CleyFaye

The small test provided returned the expected filename with [email protected], and changed with [email protected].

CleyFaye avatar Jun 09 '22 13:06 CleyFaye

Same problem after update from 1.4.4 on 1.4.5-lts.1

dvantage avatar Jun 11 '22 16:06 dvantage

Multer has nothing to do with it, Busboy has changed something. https://github.com/mscdex/busboy/issues/20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')

dvantage avatar Jun 11 '22 19:06 dvantage

Multer has something to do about this, since it definitely changed behavior in an arguably incompatible way in what looks like a patch revision.

What to do however I'm not sure; either way would be fine (interpreting the utf-8 to be consistent with previous behavior or passing the raw string to not make assumptions about encoding), but I believe this kind of change in a patch is troublesome to users.

CleyFaye avatar Jun 12 '22 00:06 CleyFaye

Multer has nothing to do with it, Busboy has changed something. mscdex/busboy#20

This solved my problem:

file.originalname = Buffer.from(file.originalname, 'latin1').toString('utf8')

God bless you.

I've managed to make a bodge in my app

    const fileName = Buffer.from(el.originalname, 'latin1').toString('utf8');

because in my case invalid £$ file.txt was becoming invalid £$ file.txt. Ideally we have this fixed when busboy is fixing that end. Thanks a lot.

ghost avatar Jun 14 '22 15:06 ghost

HI, I faced same issue with the filename in Korean. I found out that the issue is relevant to "busboy', especially config property of "defParanCharset." The default value of that property is 'latin1', which means some parameters like non-latin filename in input-form is misdecoded on nodejs side without proper configuration. However, in the "multer" we don't have option to change the config properties of busboy.

I hope the line 28 in '/lib/make-middleware.js' will be changed such as: busboy = Busboy({ headers: req.headers, limits: limits, preservePath: preservePath, defParamCharset: 'utf8' })

At least, some way to configure busboy through multer module would be required.

sominlee74 avatar Jul 27 '22 06:07 sominlee74

This issue is still relevant. Multer should not deviate from utf-8 default. An multer option should be created so that we can influence busboy defParamCharset.

bf avatar Aug 31 '22 09:08 bf

I published a multer-utf8 package on npm that read files as utf8 charset by default.

https://www.npmjs.com/package/multer-utf8

jhpung avatar Jan 13 '23 07:01 jhpung

The problem still exists, please fix it quickly

lujijiang avatar Apr 06 '23 17:04 lujijiang

Just to clarify, in Multer 1.4.4 the name was parsed as utf-8, and in Multer 1.4.5-lts.1 it's parsed as latin1?

In that case it seems straight forward to add defParamCharset: 'utf8' so that the new version behaves the same as the previous...

LinusU avatar Apr 07 '23 07:04 LinusU

  1. Tried both defParamCharset and defCharset - has no effect
multer({
	storage,
	defParamCharset: 'utf8',
	defCharset: 'utf8',
})
  1. As far as I see, only selected options are passed from the config to busboy
    https://github.com/expressjs/multer/blob/25794553989a674f4998b32a061dfc9287b23188/index.js#LL11C1-L23C2

Doc999tor avatar May 23 '23 19:05 Doc999tor

@CleyFaye @dvantage thank you very much

ngovanduy0908 avatar Aug 01 '23 03:08 ngovanduy0908

why the default Postman is right?

TiuBen avatar Aug 25 '23 16:08 TiuBen

@TiuBen

Postman uses “filename*”, so filename problems do not occur.
But browsers do not use it.

see https://github.com/expressjs/multer/issues/1104#issue-1266094642

I've seen another issue related to using "filename*", but there is two problem with that: the browser's formdata does not use this

starnayuta avatar Aug 25 '23 16:08 starnayuta

where can I use filename*

TiuBen avatar Aug 26 '23 06:08 TiuBen

Is this solved ? I still got this issue and Buffer.from(file.originalname, 'latin1').toString('utf8') solves it...

stouch avatar May 01 '24 19:05 stouch