multipart/form-data parsing appears to truncate a file part by 1 byte
Guys, sorry, I'm emotional because I spent 7 hours fixing a bug, and it turned out that the latest werkzeug 3.1.4 update was to blame. Below is text from chatgpt explaining the problem:
In Werkzeug 3.1.4, multipart/form-data parsing appears to truncate a file part by 1 byte in a reproducible edge case. The same code and client request work correctly with Werkzeug 3.1.3.
This manifests during chunked uploads from the browser: a specific chunk at a specific byte offset is parsed as length (expected - 1), while the raw (non-multipart) upload of the exact same Blob is parsed correctly. This suggests a regression in multipart parsing in 3.1.4.
Steps to reproduce:
- Run the minimal Flask app below.
- Use the HTML test page below to select a large file and run the probe.
- Observe that with Werkzeug 3.1.4 the multipart endpoint reports one chunk as 1 byte shorter, while the raw endpoint reports the correct size.
- Downgrade to Werkzeug 3.1.3 and repeat: the multipart endpoint reports correct sizes for all tested chunks.
Minimal reproducible example (server):
from flask import Flask, request, jsonify
from flask_login import LoginManager, login_required
app = Flask(__name__)
app.secret_key = "test"
login_manager = LoginManager(app)
@app.route("/upload_probe_raw", methods=["POST"])
def upload_probe_raw():
offset = request.headers.get("X-Offset", type=int)
expected = request.headers.get("X-Expected", type=int)
data = request.get_data(cache=False) or b""
return jsonify({
"offset": offset,
"expected": expected,
"content_length": request.content_length,
"data_len": len(data),
})
@app.route("/upload_probe_form", methods=["POST"])
def upload_probe_form():
offset = request.form.get("offset", type=int)
expected = request.form.get("expected", type=int)
f = request.files.get("file")
if not f:
return jsonify({"error": "no file"}), 400
data = f.read() or b""
return jsonify({
"offset": offset,
"expected": expected,
"content_length": request.content_length,
"file_len": len(data),
})
if __name__ == "__main__":
app.run(port=5000, debug=True)
Minimal reproducible example (client):
<input type="file" id="fileInput">
<button id="probeBtn">Probe upload</button>
<pre id="out"></pre>
<script>
const chunkSize = 10 * 1024 * 1024;
const badIndex = 25;
function log(s) {
document.getElementById("out").textContent += s + "\n";
}
async function sendRaw(blob, offset, expected) {
const r = await fetch("/upload_probe_raw", {
method: "POST",
headers: {
"Content-Type": "application/octet-stream",
"X-Offset": String(offset),
"X-Expected": String(expected)
},
body: blob
});
return await r.json();
}
async function sendForm(blob, offset, expected) {
const fd = new FormData();
fd.append("offset", String(offset));
fd.append("expected", String(expected));
fd.append("file", blob, "chunk.bin");
const r = await fetch("/upload_probe_form", {
method: "POST",
body: fd
});
return await r.json();
}
document.getElementById("probeBtn").onclick = async () => {
const out = document.getElementById("out");
out.textContent = "";
const f = document.getElementById("fileInput").files[0];
if (!f) {
log("No file selected");
return;
}
const indices = [badIndex - 1, badIndex, badIndex + 1];
log(`File: ${f.name}`);
log(`File size: ${f.size}`);
log(`Chunk size: ${chunkSize}`);
log("");
for (const idx of indices) {
const offset = idx * chunkSize;
const end = Math.min(offset + chunkSize, f.size);
const expected = end - offset;
const blob = f.slice(offset, end);
log(`Index ${idx} offset ${offset}`);
log(`slice.size ${blob.size} expected ${expected}`);
const rawRes = await sendRaw(blob, offset, expected);
log(`RAW -> data_len ${rawRes.data_len} content_length ${rawRes.content_length}`);
const formRes = await sendForm(blob, offset, expected);
log(`FORM -> file_len ${formRes.file_len} content_length ${formRes.content_length}`);
log("");
}
};
</script>
Observed output with Werkzeug 3.1.4:
Index 24 offset 251658240
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162
Index 25 offset 262144000
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485759 content_length 10486162
Index 26 offset 272629760
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162
With Werkzeug 3.1.3, FORM -> file_len matches the expected size for all tested chunks, including index 25.
There is no exception/traceback; this is a silent data truncation of 1 byte.
Expected behavior:
Multipart/form-data parsing should yield the exact byte sequence sent by the client. The uploaded file part length should match the expected chunk length. Specifically, for offset 262144000 with a 10 MiB chunk size, the parsed file length should be 10485760, not 10485759.
Environment:
- Python version: 3.13.10
- Werkzeug version: 3.1.4 (regression), 3.1.3 (works)
I think this PR is stealing my byte: https://github.com/pallets/werkzeug/pull/3066/commits/77bde3337286c405280bef1870442be7f235393c
See #3066, we need to find some way to reconcile both these cases. CC @emmanuelthome
It's annoying indeed.
I can't reproduce. Do you get that on every file?
Here's what I get on a 1GB upload. It looks legit.
Index 24 offset 251658240
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162
Index 25 offset 262144000
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162
Index 26 offset 272629760
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162
No, just some files. I'll send you a link to the file in a private message.
I think having a file to reproduce this shared publicly would be great, because otherwise maintainers cannot easily reproduce it / test fixes.
Given https://github.com/pallets/werkzeug/commit/77bde3337286c405280bef1870442be7f235393c it's probably a corner case that was missed and will be added to the test suite. Once we nail it down, creating on the fly a test file that triggers the issue should not be very hard (and less resource-expensive than a 300MB file posted on a permanent link).
For a testcase I fully agree, and if someone can come up with a small reproducer here that's of course even better
The file has been sent. I think that when processing “\r\n”, you should simply ignore the data in “request.files” if possible
Can you share it with me as well?
I'll have a much smaller one shortly.
A 64kb file created as:
(dd if=/dev/zero bs=65535 count=1 ; echo -ne '\015' ) > /tmp/z
eats the terminating 0x0d byte of the first upload (i.e., change chunkSize to 65536 and badIndex to 1 in client.html).
I see where to go from there. It's directly related to my comment there. https://github.com/pallets/werkzeug/pull/3066#issue-3623040169 ; let's see if/how it's possible to recover the event in the event stream that only gets lost in the form case.
It should be OK. @davidism does that look fine to you? Are there other corner cases of this kind you can think of?
I've been bitten by this too. Our tests turned red when the CI picked the latest release.
import io
from flask import Flask, request
import requests
app = Flask(__name__)
@app.route('/upload', methods=['POST'])
def upload():
return {'len': len(request.files['file'].read())}
# Data ending with 0x0D (CR)
data = b'hello\x0d'
# Start server
from werkzeug.serving import make_server
server = make_server('127.0.0.1', 5555, app, threaded=True)
import threading
threading.Thread(target=server.serve_forever, daemon=True).start()
import time
time.sleep(0.1)
# Upload via multipart
resp = requests.post('http://127.0.0.1:5555/upload', files={'file': io.BytesIO(data)})
assert resp.json()['len'] == len(data), f"Expected {len(data)}, got {resp.json()['len']}"