http_server icon indicating copy to clipboard operation
http_server copied to clipboard

http_server uploaded file handling

Open DartBot opened this issue 9 years ago • 9 comments

Originally opened as dart-lang/sdk#14303

This issue was originally filed by [email protected]


Current http_body implementation decodes / parses HttpBodyFileUpload.content based upon the Content-Type header of the part.

However, file server applications need raw uploaded file data. Such applications save received files into their file system with no modification. I think we need a switch to disable decoding / parsing for all content types and just return type List<int> object as HttpBodyFileUpload.content.

Another solution would be to return simply List<int> for all uploaded files. I am not sure how much the impact of this on other kinds of applications. However, when a .json file was uploaded from HTML form, IE sends it as “Content-Type: text/plain” and Chrome, Firefox and Safari send it as “Content-Type: application/octet-stream” (I don’t know why). In any case, .json files uploaded through HTML form will never be parsed.

Regarding to the filename with multibyte characters, although Windows uses UTF-8, I think it might be safe to keep LATIN1 decoding. I am not familiar with other file systems. We can retrieve it using UTF8.decode(LATIN1.encode(part.filename)) with Windows, assuming that the LATIN1.decode(bytes) simply generates a String that has the same byte character to corresponding byte of bytes.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Set owner to @skabet. Removed Area-IO label. Added Area-Pkg, Library-HttpServer, Accepted labels.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Hi Terry,

You are right, in most cases with file uploads, it's the raw binary one wants to access. What if we do the following:

  1. Always provide the raw List<int> data.
  2. Add a method to the FileUpload class: 'parsedData()', that will try and parse/decode the data depending on the mime type. We can even throw in a optional 'mineType' argument for it, so one can override the default mime type, e.g. parse as 'text/utf-8' instead of 'application/json'.

Regarding the filename, I think we should do a test and see what the different browsers upload. if we can hit a 90% success rate with some default encoding, that could be the way to go.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


I just tried with both chrome and Windows, and I get the following:

With &lt;meta charset="UTF-8" />:  - Chrome: as utf8  - IE: as utf8

Without &lt;meta charset="UTF-8" />:  - Chrome: multi-bytes replaced with ?  - IE: as utf8

I think it's fine to use utf8-decoding for filenames.

DartBot avatar Jun 05 '15 22:06 DartBot

This comment was originally written by [email protected]


I confirmed it on my Windows Vista using following HTML text:

001 <!DOCTYPE html> 002 <html> 003 <head> 004 <title>file_upload_test</title> 005 <meta http-equiv="content-type" content="text/html; charset=UTF-8"> 006 </head> 007 <body> 008 <form action="http://localhost:8080/DumpHttpMultipart" 009 enctype="multipart/form-data" 010 accept-charset="UTF-8"
011 method="POST"> <br> 012 What is your name? <input type="text" name="submitter"> <br> 013 What files are you sending? <input type="file" name="content"> <br> 014 <input type="submit" value="Send File"> 015 </form> 016 </body> 017 </html>

If line 005 or 010 exists, Chrome, Firefox and Safari send filenames with multi-byte characters as UTF-8. Otherwise, such filenames are transmitted as Shit_JIS characters (one of most popular Japanese character encodings). Regardless of existence of line 005 or 010, IE sends them as UTF-8.

I agree to use UTF-8 decoding (current implementation uses ISO-8859-1 decoding) for filenames. It’s common to add line 005 for such applications.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Hi

What do you think about the following API?

/**  * A HTTP content body produced by [HttpBodyHandler] for either [HttpRequest]  * or [HttpClientResponse].  / abstract class HttpBody {   /**     The actual data of the request.    */   List<int> get data;

  /**    * Convert the data using mimeType.    *    * If mimeType is left unspecified, the Content-Type header will be used.    */   dynamic asMimeType({String mimeType});

  /**    * Parse the [data] as text.    *    * If the headers contains a charset hint, that charset will be used.    */   String asText();

  /**    * Parse the [data] as JSON.    */   dynamic asJSON();

  /**    * Parse the data as either multipart/form-data or    * application/x-www-form-urlencoded.    *    * The Content-Type header will be used to identify the parsing.    */   Map asFormPost(); }

/**  * The [HttpBody] of a [HttpClientResponse] will be of type  * [HttpClientResponseBody].  */ abstract class HttpClientResponseBody implements HttpBody, HttpClientResponse { }

/**  * The [HttpBody] of a [HttpRequest] will be of type [HttpRequestBody].  */ abstract class HttpRequestBody implements HttpBody, HttpRequest { }

/**  * A [HttpBodyFileUpload] object wraps a file upload, presenting a way for  * extracting filename, contentType and the data of the uploaded file.  / abstract class HttpBodyFileUpload {   /**     The filename of the uploaded file.    */   String get filename;

  /**    * The [ContentType] of the uploaded file.    */   ContentType get contentType;

  /**    * The content of the file.    */   List<int> get content; }


cc @sethladd.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/5479?v=3" align="left" width="48" height="48"hspace="10"> Comment by sethladd


Thanks! I like how HttpRequestBody implements HttpRequest now. Also, I like how I can control how I get the body (json, text, etc) because sometimes a content-type is not set on the request.

DartBot avatar Jun 05 '15 22:06 DartBot

This comment was originally written by [email protected]


I think this will give us more flexible POST body data handling.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/3276024?v=3" align="left" width="48" height="48"hspace="10"> Comment by anders-sandholm


Removed Library-HttpServer label. Added Pkg-HttpServer label.

DartBot avatar Jun 05 '15 22:06 DartBot