actix-web icon indicating copy to clipboard operation
actix-web copied to clipboard

Filename escaping in multipart/form-data Content-Disposition

Open iyzana opened this issue 3 years ago • 5 comments

(Un)escaping of the filename in the Content-Disposition "header" of multipart/form-data streams does not match the other implementations I tested.

Expected Behavior

In particular, the filename \" is escaped as \%22 in curl, firefox and chrome. There is an existing discussion of this at https://github.com/curl/curl/issues/7789. actix-web should decode this as \".

Current Behavior

actix-web currently implements RFC 6266 for filename encoding and decodes this as %22, dropping the \ and not unescaping the ".

Possible Solution

Use RFC 7578 for quoted strings in the DispositionParam (un)escaping code. I am not sure if this conflicts with Content-Disposition escaping for normal HTTP-Response headers. Also, as the comment in test_from_raw_unnecessary_percent_decode states, other implementations do not percent-escape higher UTF-8 codepoints, but they do percent-escape " (and as far as i can see, only " and no other bytes).

Steps to Reproduce (for bugs)

ncat -l 8000 &
touch "\\\"¢"
curl -v -F "file=@\\\"¢" localhost:8000
^C

which produces a body similar to this (not escaping the \ or ¢ and percent escaping the "):

--------------------------a80b394c70a9c219
Content-Disposition: form-data; name="file"; filename="\%22¢"
Content-Type: application/octet-stream


--------------------------a80b394c70a9c219--

Adapted from http::header::content_disposition::test_from_raw_unnecessary_percent_decode

let a = HeaderValue::from_static(
    "form-data; name=\"file\"; filename=\"\\%22\"¢",
);
let a: ContentDisposition = ContentDisposition::from_raw(&a).unwrap();
let b = ContentDisposition {
    disposition: DispositionType::FormData,
    parameters: vec![
        DispositionParam::Name("file".to_owned()),
        DispositionParam::Filename(String::from("\\\"¢")),
    ],
};
assert_eq!(a, b);

Context

I am building a temporary file-uploading service, and tried uploading a file which has a " in its name, which works but writes a wrong name to the database.

Your Environment

  • Rust Version: rustc 1.59.0 (9d1b2106e 2022-02-23)
  • Actix Web Version: 4.0.1
  • curl Version: 7.82.0
  • Firefox Version: Mozilla Firefox 99.0.1

iyzana avatar Apr 23 '22 10:04 iyzana

I am trying to understand the issue.

Do you mean: a file named foo\"bar is received by actix-web as foo%22bar?

Gowee avatar May 16 '23 07:05 Gowee

Yes, posting multipart/form-data with a file named foo\"bar is parsed by actix-web into a ContentDisposition that has foo%22bar as its filename param.

iyzana avatar May 16 '23 08:05 iyzana

It was an implementation decision at that time as documented at https://github.com/actix/actix-web/blob/909461087c608855aa252bf2b3343aac53210c75/actix-web/src/http/header/content_disposition.rs#L849

In fact, RFC 7578 (multipart/form-data) Section 2 and 4.2 suggests that filename with non-ASCII characters MAY be percent-encoded. On the contrary, RFC 6266 or other RFCs related to Content-Disposition response header do not mention such percent-encoding. So, it appears to be undecidable whether to percent-decode or not without knowing the usage scenario (multipart/form-data v.s. HTTP response header) and inevitable to unnecessarily percent-decode filename with %XX in the former scenario. Fortunately, it seems that almost all mainstream browsers just send UTF-8 encoded file names in quoted-string format (tested on Edge, IE11, Chrome and Firefox) without percent-encoding. So we do not bother to attempt to percent-decode.

We need to investigate whether this is still relevant nowadays. If percent-decoding is really necessary here, the Content-Disposition header for HTTP response and multipart/form-data need to be separated out.

Gowee avatar May 16 '23 09:05 Gowee

Some test results:

filename / user-agent 你好 (CJKV unicode) foo"bar foo\"bar
Firefox 112.0.1 filename="你好" filename="foo%22bar" filename=""
Chrome 104.0.5112.101 filename="你好" filename="foo%22bar" filename="foo\%22bar"
cURL 8.0.1 filename="你好" filename="foo%22bar" filename="foo\%22bar"

Gowee avatar May 16 '23 09:05 Gowee

If percent-decoding is really necessary here, the Content-Disposition header for HTTP response and multipart/form-data need to be separated out.

Choosing a proper encoding scheme per disposition type (inline/form-data/attachment) might be feasible. It would induce less maintenance burden even though being a little "dirty".

The problem is that we need to investigate how mainstream browsers behave now and in the future since these implementations seem unstable. I did try to search the issue trackers of Firefox and Chrome, to find nothing, though.

Gowee avatar Jun 08 '23 14:06 Gowee