actix-web
actix-web copied to clipboard
Filename escaping in multipart/form-data Content-Disposition
(Un)escaping of the filename in the Content-Disposition "header" of multipart/form-data streams does not match the other implementations I tested.
Expected Behavior
In particular, the filename \" is escaped as \%22 in curl, firefox and chrome.
There is an existing discussion of this at https://github.com/curl/curl/issues/7789.
actix-web should decode this as \".
Current Behavior
actix-web currently implements RFC 6266 for filename encoding and decodes this as %22, dropping the \ and not unescaping the ".
Possible Solution
Use RFC 7578 for quoted strings in the DispositionParam (un)escaping code.
I am not sure if this conflicts with Content-Disposition escaping for normal HTTP-Response headers.
Also, as the comment in test_from_raw_unnecessary_percent_decode states, other implementations do not percent-escape higher UTF-8 codepoints, but they do percent-escape " (and as far as i can see, only " and no other bytes).
Steps to Reproduce (for bugs)
ncat -l 8000 &
touch "\\\"¢"
curl -v -F "file=@\\\"¢" localhost:8000
^C
which produces a body similar to this (not escaping the \ or ¢ and percent escaping the "):
--------------------------a80b394c70a9c219
Content-Disposition: form-data; name="file"; filename="\%22¢"
Content-Type: application/octet-stream
--------------------------a80b394c70a9c219--
Adapted from http::header::content_disposition::test_from_raw_unnecessary_percent_decode
let a = HeaderValue::from_static(
"form-data; name=\"file\"; filename=\"\\%22\"¢",
);
let a: ContentDisposition = ContentDisposition::from_raw(&a).unwrap();
let b = ContentDisposition {
disposition: DispositionType::FormData,
parameters: vec![
DispositionParam::Name("file".to_owned()),
DispositionParam::Filename(String::from("\\\"¢")),
],
};
assert_eq!(a, b);
Context
I am building a temporary file-uploading service, and tried uploading a file which has a " in its name, which works but writes a wrong name to the database.
Your Environment
- Rust Version: rustc 1.59.0 (9d1b2106e 2022-02-23)
- Actix Web Version: 4.0.1
- curl Version: 7.82.0
- Firefox Version: Mozilla Firefox 99.0.1
I am trying to understand the issue.
Do you mean: a file named foo\"bar is received by actix-web as foo%22bar?
Yes, posting multipart/form-data with a file named foo\"bar is parsed by actix-web into a ContentDisposition that has foo%22bar as its filename param.
It was an implementation decision at that time as documented at https://github.com/actix/actix-web/blob/909461087c608855aa252bf2b3343aac53210c75/actix-web/src/http/header/content_disposition.rs#L849
In fact, RFC 7578 (multipart/form-data) Section 2 and 4.2 suggests that filename with non-ASCII characters MAY be percent-encoded. On the contrary, RFC 6266 or other RFCs related to Content-Disposition response header do not mention such percent-encoding. So, it appears to be undecidable whether to percent-decode or not without knowing the usage scenario (multipart/form-data v.s. HTTP response header) and inevitable to unnecessarily percent-decode filename with %XX in the former scenario. Fortunately, it seems that almost all mainstream browsers just send UTF-8 encoded file names in quoted-string format (tested on Edge, IE11, Chrome and Firefox) without percent-encoding. So we do not bother to attempt to percent-decode.
We need to investigate whether this is still relevant nowadays. If percent-decoding is really necessary here, the Content-Disposition header for HTTP response and multipart/form-data need to be separated out.
Some test results:
| filename / user-agent | 你好 (CJKV unicode) |
foo"bar | foo\"bar |
|---|---|---|---|
| Firefox 112.0.1 | filename="你好" |
filename="foo%22bar" |
filename="" |
| Chrome 104.0.5112.101 | filename="你好" |
filename="foo%22bar" |
filename="foo\%22bar" |
| cURL 8.0.1 | filename="你好" |
filename="foo%22bar" |
filename="foo\%22bar" |
If percent-decoding is really necessary here, the Content-Disposition header for HTTP response and multipart/form-data need to be separated out.
Choosing a proper encoding scheme per disposition type (inline/form-data/attachment) might be feasible. It would induce less maintenance burden even though being a little "dirty".
The problem is that we need to investigate how mainstream browsers behave now and in the future since these implementations seem unstable. I did try to search the issue trackers of Firefox and Chrome, to find nothing, though.