werkzeug icon indicating copy to clipboard operation
werkzeug copied to clipboard

multipart/form-data parsing appears to truncate a file part by 1 byte

Open swell-d opened this issue 3 weeks ago • 13 comments

Guys, sorry, I'm emotional because I spent 7 hours fixing a bug, and it turned out that the latest werkzeug 3.1.4 update was to blame. Below is text from chatgpt explaining the problem:

In Werkzeug 3.1.4, multipart/form-data parsing appears to truncate a file part by 1 byte in a reproducible edge case. The same code and client request work correctly with Werkzeug 3.1.3.

This manifests during chunked uploads from the browser: a specific chunk at a specific byte offset is parsed as length (expected - 1), while the raw (non-multipart) upload of the exact same Blob is parsed correctly. This suggests a regression in multipart parsing in 3.1.4.

Steps to reproduce:

  1. Run the minimal Flask app below.
  2. Use the HTML test page below to select a large file and run the probe.
  3. Observe that with Werkzeug 3.1.4 the multipart endpoint reports one chunk as 1 byte shorter, while the raw endpoint reports the correct size.
  4. Downgrade to Werkzeug 3.1.3 and repeat: the multipart endpoint reports correct sizes for all tested chunks.

Minimal reproducible example (server):

from flask import Flask, request, jsonify
from flask_login import LoginManager, login_required

app = Flask(__name__)
app.secret_key = "test"

login_manager = LoginManager(app)

@app.route("/upload_probe_raw", methods=["POST"])
def upload_probe_raw():
    offset = request.headers.get("X-Offset", type=int)
    expected = request.headers.get("X-Expected", type=int)
    data = request.get_data(cache=False) or b""
    return jsonify({
        "offset": offset,
        "expected": expected,
        "content_length": request.content_length,
        "data_len": len(data),
    })

@app.route("/upload_probe_form", methods=["POST"])
def upload_probe_form():
    offset = request.form.get("offset", type=int)
    expected = request.form.get("expected", type=int)

    f = request.files.get("file")
    if not f:
        return jsonify({"error": "no file"}), 400

    data = f.read() or b""

    return jsonify({
        "offset": offset,
        "expected": expected,
        "content_length": request.content_length,
        "file_len": len(data),
    })

if __name__ == "__main__":
    app.run(port=5000, debug=True)

Minimal reproducible example (client):

<input type="file" id="fileInput">
<button id="probeBtn">Probe upload</button>
<pre id="out"></pre>

<script>
const chunkSize = 10 * 1024 * 1024;
const badIndex = 25;

function log(s) {
  document.getElementById("out").textContent += s + "\n";
}

async function sendRaw(blob, offset, expected) {
  const r = await fetch("/upload_probe_raw", {
    method: "POST",
    headers: {
      "Content-Type": "application/octet-stream",
      "X-Offset": String(offset),
      "X-Expected": String(expected)
    },
    body: blob
  });
  return await r.json();
}

async function sendForm(blob, offset, expected) {
  const fd = new FormData();
  fd.append("offset", String(offset));
  fd.append("expected", String(expected));
  fd.append("file", blob, "chunk.bin");

  const r = await fetch("/upload_probe_form", {
    method: "POST",
    body: fd
  });
  return await r.json();
}

document.getElementById("probeBtn").onclick = async () => {
  const out = document.getElementById("out");
  out.textContent = "";

  const f = document.getElementById("fileInput").files[0];
  if (!f) {
    log("No file selected");
    return;
  }

  const indices = [badIndex - 1, badIndex, badIndex + 1];

  log(`File: ${f.name}`);
  log(`File size: ${f.size}`);
  log(`Chunk size: ${chunkSize}`);
  log("");

  for (const idx of indices) {
    const offset = idx * chunkSize;
    const end = Math.min(offset + chunkSize, f.size);
    const expected = end - offset;
    const blob = f.slice(offset, end);

    log(`Index ${idx} offset ${offset}`);
    log(`slice.size ${blob.size} expected ${expected}`);

    const rawRes = await sendRaw(blob, offset, expected);
    log(`RAW -> data_len ${rawRes.data_len} content_length ${rawRes.content_length}`);

    const formRes = await sendForm(blob, offset, expected);
    log(`FORM -> file_len ${formRes.file_len} content_length ${formRes.content_length}`);

    log("");
  }
};
</script>

Observed output with Werkzeug 3.1.4:

Index 24 offset 251658240
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162

Index 25 offset 262144000
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485759 content_length 10486162

Index 26 offset 272629760
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162

With Werkzeug 3.1.3, FORM -> file_len matches the expected size for all tested chunks, including index 25.

There is no exception/traceback; this is a silent data truncation of 1 byte.

Expected behavior:

Multipart/form-data parsing should yield the exact byte sequence sent by the client. The uploaded file part length should match the expected chunk length. Specifically, for offset 262144000 with a 10 MiB chunk size, the parsed file length should be 10485760, not 10485759.

Environment:

  • Python version: 3.13.10
  • Werkzeug version: 3.1.4 (regression), 3.1.3 (works)

swell-d avatar Dec 05 '25 14:12 swell-d

I think this PR is stealing my byte: https://github.com/pallets/werkzeug/pull/3066/commits/77bde3337286c405280bef1870442be7f235393c

swell-d avatar Dec 05 '25 15:12 swell-d

See #3066, we need to find some way to reconcile both these cases. CC @emmanuelthome

davidism avatar Dec 08 '25 14:12 davidism

It's annoying indeed.

I can't reproduce. Do you get that on every file?

Here's what I get on a 1GB upload. It looks legit.

Index 24 offset 251658240
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162

Index 25 offset 262144000
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162

Index 26 offset 272629760
slice.size 10485760 expected 10485760
RAW -> data_len 10485760 content_length 10485760
FORM -> file_len 10485760 content_length 10486162

emmanuelthome avatar Dec 09 '25 17:12 emmanuelthome

No, just some files. I'll send you a link to the file in a private message.

swell-d avatar Dec 09 '25 17:12 swell-d

I think having a file to reproduce this shared publicly would be great, because otherwise maintainers cannot easily reproduce it / test fixes.

ThiefMaster avatar Dec 09 '25 18:12 ThiefMaster

Given https://github.com/pallets/werkzeug/commit/77bde3337286c405280bef1870442be7f235393c it's probably a corner case that was missed and will be added to the test suite. Once we nail it down, creating on the fly a test file that triggers the issue should not be very hard (and less resource-expensive than a 300MB file posted on a permanent link).

emmanuelthome avatar Dec 09 '25 18:12 emmanuelthome

For a testcase I fully agree, and if someone can come up with a small reproducer here that's of course even better

ThiefMaster avatar Dec 09 '25 18:12 ThiefMaster

The file has been sent. I think that when processing “\r\n”, you should simply ignore the data in “request.files” if possible

swell-d avatar Dec 09 '25 18:12 swell-d

Can you share it with me as well?

ThiefMaster avatar Dec 09 '25 18:12 ThiefMaster

I'll have a much smaller one shortly.

emmanuelthome avatar Dec 09 '25 18:12 emmanuelthome

A 64kb file created as:

(dd if=/dev/zero bs=65535 count=1 ; echo -ne '\015' ) > /tmp/z

eats the terminating 0x0d byte of the first upload (i.e., change chunkSize to 65536 and badIndex to 1 in client.html).

I see where to go from there. It's directly related to my comment there. https://github.com/pallets/werkzeug/pull/3066#issue-3623040169 ; let's see if/how it's possible to recover the event in the event stream that only gets lost in the form case.

emmanuelthome avatar Dec 09 '25 18:12 emmanuelthome

It should be OK. @davidism does that look fine to you? Are there other corner cases of this kind you can think of?

emmanuelthome avatar Dec 09 '25 23:12 emmanuelthome

I've been bitten by this too. Our tests turned red when the CI picked the latest release.

import io
from flask import Flask, request
import requests

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload():
    return {'len': len(request.files['file'].read())}

# Data ending with 0x0D (CR)
data = b'hello\x0d'

# Start server
from werkzeug.serving import make_server
server = make_server('127.0.0.1', 5555, app, threaded=True)
import threading
threading.Thread(target=server.serve_forever, daemon=True).start()

import time
time.sleep(0.1)

# Upload via multipart
resp = requests.post('http://127.0.0.1:5555/upload', files={'file': io.BytesIO(data)})

assert resp.json()['len'] == len(data), f"Expected {len(data)}, got {resp.json()['len']}"

zogzog avatar Dec 10 '25 11:12 zogzog