allow parsing utf8 filenames

Open 5c077yP opened this issue 8 years ago • 1 comments

Hey there, first thanks for this great library!

I can see that this library supports to generate the content-disposition header from a utf8 filename and allows parsing utf8 encoded filename from the header. But I can see that my browser (chrome 58) when uploading a file with a utf8 filename does not do a proper utf8 encoding of the filename (it looks like: Content-Disposition: form-data; filename="ö").

My current use-case is: I'm using this library as a middleware to parse the Content-Disposition header to upload a file to AWS-S3 and set the disposition header there as well. Now the parsing throws a "Invalid Format" exception. I feel it would be great if the lib would just accept this and can generate a valid utf8 encoded header when doing:

const parsed = contentDisposition.parse(headers['content-disposition']);
const header = contentDisposition(parsed.params.filename, { type: parsed.type });

Would do you think about it?

May 26 '17 10:05 5c077yP

Hi @5c077yP yes, right now this module only supports the HTTP header Content-Disposition; it does not support the MIME headers that are contained within multipart objects like multipart/form-data. The main reason is that the HTTP header actually has a specification and clients and browsers follow it, while the MIME header of the same name within multiparts have no specification to even make an implementation against.

I have sat down a few times to read the source code for Chrome & Firefox to deconstruct their obscure ways they are encoding the file names, but never really finished (of course, Safari and IE you'd have to do experimentally). What I generally found was that some of them just put raw UTF-8, IE seems to put raw whatever the user's OS encoding is set to (latin1 for US, big5 for Japan, etc.), some will also url-encode certain characters, but not others, and then seems like when you see a %20 it could be a space or an actual literal %20 in the file on the user's computer and you can't tell.

I was tracking trying to implement this with #3 but getting something that would actually work with any browser, any computer, and any character in the file name seems to be a very difficult task, if you want to lend a hand. The issue is, though, simply accepting raw utf-8 would work for the specific case you ran into, but then you're likely to continue run into each of the issues I listed above, so really I want to either support multipart (future) well or not support it at all (current).

I just haven't really had a lot of motivation to work on the true multipart support, so yea, I guess second invitation to help out here :) My ideal thought is to just implement the reverse of the open source browsers' implementations and then experimentally figure out the closed-source browsers and bundle all that together as a "multipart mode".

May 26 '17 17:05 dougwilson