deno icon indicating copy to clipboard operation
deno copied to clipboard

`fetch` corrupts files when sending them via TLS

Open KnorpelSenf opened this issue 3 years ago • 4 comments

Bug Report

When streaming a file to an HTTPS URL, fetch corrupts it.

Reproduction

First, lets set up a server which accepts files via multipart/form-data, and responds with a hash of the first file.

// server.ts
import { Application } from "https://deno.land/x/[email protected]/mod.ts";
import { Hash } from "https://deno.land/x/[email protected]/mod.ts";

const app = new Application();

app.use(async (ctx) => {
  const body = ctx.request.body();
  if (body.type === "form-data") {
    const r = await body.value.read();
    try {
      const file = r.files?.[0].filename;
      if (file !== undefined) {
        const contents = await Deno.readFile(file);
        const hash = new Hash("md5").digest(contents).hex();
        ctx.response.body = JSON.stringify({ hash });
      } else {
        ctx.response.body = JSON.stringify({ error: "no file in form data" });
      }
    } finally {
      await Promise.all(
        r.files?.map((f) => f.filename && Deno.remove(f.filename)) ?? [],
      );
    }
  } else {
    ctx.response.body = JSON.stringify({ error: "no form data in request" });
  }
});

console.log("listening on port 8000");
await app.listen({ port: 8000 });

For convenience, I am hosting this exact script here: https://deno-1-21-0-repro.knorpelsenf.me

Next, take this <400 KB PDF file which is known to be affected by the issue: file.pdf

Third, we can check which hash is produced by posting the downloaded file using curl.

$ curl -s https://deno-1-21-0-repro.knorpelsenf.me -F "[email protected]"
{"hash":"cd2dd0ead15d0376f3966fd8a77b9a3e"}

Now we can reproduce the issue using Deno. We can use this script which constructs a multipart/form-data stream and sends it to the above server. Feel free to change the endpoint in the code in order to reproduce this independently.

import {
  iterateReader,
  readableStreamFromIterable,
} from "https://deno.land/[email protected]/streams/mod.ts";

const enc = new TextEncoder();

const path = Deno.args[0]!;

const boundary = "----------" + Array.from(Array(32))
  .map(() => Math.random().toString(36)[2] || 0)
  .join("");

async function* fileToMultiPart(): AsyncIterableIterator<Uint8Array> {
  yield enc.encode(
    `--${boundary}\r\ncontent-disposition:form-data;name="file"\r\ncontent-type:application/octet-stream\r\n\r\n`,
  );
  const file = await Deno.open(path);
  yield* iterateReader(file);
  yield enc.encode(`\r\n--${boundary}--\r\n`);
}

const response = await fetch("https://deno-1-21-0-repro.knorpelsenf.me", {
  method: "POST",
  headers: { "content-type": `multipart/form-data; boundary=${boundary}` },
  body: readableStreamFromIterable(fileToMultiPart()),
});
console.log(await response.text());

We can check that Deno 1.20.6 is still working correctly.

$ deno upgrade --version 1.20.6 
Checking https://github.com/denoland/deno/releases/download/v1.20.6/deno-x86_64-unknown-linux-gnu.zip
32.1 MiB / 32.1 MiB (100.0%)
Deno is upgrading to version 1.20.6
Archive:  /tmp/.tmpJo4csn/deno.zip
  inflating: deno                    
Upgraded successfully
$ deno run --allow-net --allow-read=file.pdf script.ts file.pdf 
{"hash":"cd2dd0ead15d0376f3966fd8a77b9a3e"}

Note how the received hash is identical to the one received using curl.

Upgrading to Deno 1.21.0, the hash changes:

$ deno upgrade --version 1.21.0
Checking https://github.com/denoland/deno/releases/download/v1.21.0/deno-x86_64-unknown-linux-gnu.zip
32.2 MiB / 32.2 MiB (100.0%)
Deno is upgrading to version 1.21.0
Archive:  /tmp/.tmp6PQXsC/deno.zip
  inflating: deno                    
Upgraded successfully
$ deno run --allow-net --allow-read=file.pdf script.ts file.pdf 
{"hash":"2601fcff60252e9b2b491ae5126513d9"}

This PDF file was corrupted when sending it. It now looks like this: file-corrupted.pdf

Version Info

The issue occurs first on Deno 1.21.0. It does not occur on 1.20.6. It was not fixed in 1.21.1.

$ uname -sv
Linux #1 SMP Debian 5.10.106-1 (2022-03-17)
$ deno --version # running the server script
deno 1.21.1 (release, x86_64-unknown-linux-gnu)
v8 10.0.139.17
typescript 4.6.2

KnorpelSenf avatar Apr 29 '22 19:04 KnorpelSenf

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 02 '22 03:07 stale[bot]

Deno 1.23.2 still is not able to send files reliably using fetch. This issue remains unfixed. It still reproduces exactly the same way.

KnorpelSenf avatar Jul 02 '22 08:07 KnorpelSenf

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 08 '22 21:09 stale[bot]

This is still happening. fetch is still not able to reliably send files.

KnorpelSenf avatar Sep 08 '22 22:09 KnorpelSenf

This seems to have started from https://github.com/denoland/deno/pull/14319 (compared 3d1123f8b09cecfa57a93d8b8b7d19af2b45f070 and 1c05e41f37da022971f0090b2a92e6340d230055 with deno upgrade --canary --version <hash>)

seems like a problem of reading file

kt3k avatar Sep 14 '22 08:09 kt3k

seems like a problem of reading file

The issue does not appear when using HTTP. Can it still be related to reading a file?

KnorpelSenf avatar Sep 14 '22 09:09 KnorpelSenf

It is possible that this corruption is happening in the server, not in the client.

lucacasonato avatar Sep 30 '22 13:09 lucacasonato

I am always using the exact same Deno version to run the exact same script on the server. How is it possible for the hash to change if I update the client?

KnorpelSenf avatar Oct 01 '22 07:10 KnorpelSenf

Reduced the reproducing example to the single file. The following script creates a example txt file (data.txt), start the server, makes a request, compares the file digest on server and client, and saves the corrupted file if digests don't match (no dependency to the external server or 3rd party modules).

import { serve } from "https://deno.land/[email protected]/http/server.ts";
import {
  iterateReader,
  readableStreamFromIterable,
} from "https://deno.land/[email protected]/streams/mod.ts";
import { createHash } from "https://deno.land/[email protected]/hash/mod.ts";

const path = "data.txt";

Deno.writeTextFileSync(path, "abcdefghi\n".repeat(100000));

const enc = new TextEncoder();
const boundary = "----------" + "a".repeat(32);
const controller = new AbortController();
let clientHash: string;

serve(async (req) => {
  const hash = createHash("md5");
  const data = await req.formData();
  const file = data.get("file");
  const bytes = await new Response(file).arrayBuffer();
  const d = hash.update(bytes).digest();
  const hex = [...new Uint8Array(d)].map(x => x.toString(16).padStart(2, '0')).join('');
  if (hex !== clientHash) {
    console.log("file corrupt");
    await Deno.writeFile("corrupt.txt", new Uint8Array(bytes));
  }
  return new Response(`hash on server ${hex}`);
}, { signal: controller.signal });

async function* fileToMultiPart(): AsyncIterableIterator<Uint8Array> {
  const hash = createHash("md5");
  yield enc.encode(
    `--${boundary}\r\nContent-Disposition: form-data; name="file"; filename="file.pdf"\r\nContent-Type: application/pdf\r\n\r\n`,
  );
  const file = await Deno.open(path);
  const iter = iterateReader(file);
  // const iter = file.readable;
  for await (const i of iter) {
    hash.update(i);
    yield i;
  }
  const digest = [...new Uint8Array(hash.digest())].map(x => x.toString(16).padStart(2, '0')).join('')
  clientHash = digest;
  console.log("hash on client", digest);
  yield enc.encode(`\r\n--${boundary}--\r\n`);
}

async function makeRequest() {
  const response = await fetch("http://localhost:8000", {
    method: "POST",
    headers: {
      "content-type": `multipart/form-data; boundary=${boundary}`,
    },
    body: readableStreamFromIterable(fileToMultiPart()),
  });
  console.log(await response.text());
}

await makeRequest();
controller.abort();

This script executes like the below (the corruption happens in the 3rd example):

$ deno run -A x.ts         
Listening on http://localhost:8000/
hash on client 3b97e79020c8bbdebe3e29f196859cfe
hash on server 3b97e79020c8bbdebe3e29f196859cfe
$ deno run -A x.ts
Listening on http://localhost:8000/
hash on client 3b97e79020c8bbdebe3e29f196859cfe
hash on server 3b97e79020c8bbdebe3e29f196859cfe
$ deno run -A x.ts
Listening on http://localhost:8000/
hash on client 3b97e79020c8bbdebe3e29f196859cfe
file corrupt
hash on server e9e7bbcd78c3009f3b9fe4716d425c55
$ diff data.txt corrupt.txt
5836a5837
> abcdefghghi
6553,6554c6554
< abcdefghi
< abcdefghi
---
> abcdghi

From my observation, when the uploaded file is broken, the file length is never changed, and also there's no bytes in the broken file which didn't exist in the original file. So my guess is that the order of chunks can be swapped somewhere.

kt3k avatar Oct 01 '22 16:10 kt3k

Fascinating, this makes sense. So perhaps it is a race condition and using TLS only makes it much more likely to appear (due to the added overhead?) but it's not actually related to TLS? If that's true, we should be able to strip out the HTTP server entirely.

Look at this:

import { createHash } from "https://deno.land/[email protected]/hash/mod.ts";
import { iterateReader } from "https://deno.land/[email protected]/streams/mod.ts";

const path = "data.txt";

Deno.writeTextFileSync(path, "abcdefghi\n".repeat(100000));

let clientHash: string;

async function* fileWithHash(
  repro: boolean
): AsyncIterableIterator<Uint8Array> {
  const hash = createHash("md5");
  const file = await Deno.open(path);
  const iter = iterateReader(file); // EVIL
  for await (const i of repro ? iter : file.readable) {
    hash.update(i);
    yield i;
  }
  const hex = [...new Uint8Array(hash.digest())]
    .map((x) => x.toString(16).padStart(2, "0"))
    .join("");
  clientHash = hex;
  console.log("hash while reading", hex);
}

async function makeRequest(repro = false) {
  const hash = createHash("md5");
  const bytes = await collect(fileWithHash(repro));
  const d = hash.update(bytes).digest();
  const hex = [...new Uint8Array(d)]
    .map((x) => x.toString(16).padStart(2, "0"))
    .join("");
  console.log("hash after collecting", hex);
  if (hex !== clientHash) {
    console.log("file corrupt");
    await Deno.writeFile("corrupt.txt", new Uint8Array(bytes));
  }
}

async function collect(stream: AsyncIterable<Uint8Array>): Promise<Uint8Array> {
  let len = 0;
  const chunks: Uint8Array[] = [];
  for await (const chunk of stream) {
    chunks.push(chunk);
    len += chunk.byteLength;
  }
  let off = 0;
  const buf = new Uint8Array(len);
  for (const chunk of chunks) {
    buf.set(chunk, off);
    off += chunk.byteLength;
  }
  return buf;
}

await makeRequest(Deno.args[0] !== "-v");

I modified the script to only stream a file into an array of chunks, and then collect this array into a large buffer.

You can pass -v to avoid the reproduction.

$ deno run -A repro.ts 
hash while reading 3b97e79020c8bbdebe3e29f196859cfe
hash after collecting fb0ff074f33f428e6c4448768a8b87b3
file corrupt
$ deno run -A repro.ts -v
hash while reading 3b97e79020c8bbdebe3e29f196859cfe
hash after collecting 3b97e79020c8bbdebe3e29f196859cfe
$ deno run -A repro.ts 
hash while reading 3b97e79020c8bbdebe3e29f196859cfe
hash after collecting fb0ff074f33f428e6c4448768a8b87b3
file corrupt
$ diff data.txt corrupt.txt | wc -l
85

This seems to work every time.

This tracks it down to iterateReader from /std.

I have no idea what Deno v1.20.6 did differently from 1.21.0 that it did not suffer from this issue, but it seems like the idea to modify a shared buffer in-place is somehow breaking things?

KnorpelSenf avatar Oct 02 '22 10:10 KnorpelSenf

but it seems like the idea to modify a shared buffer in-place is somehow breaking things?

Looks like iterateReader reuses the shared underlying buffer object for all outputs and that causes the issue. (If the consumer of iterable is slow at handling bytes, then the next call of reader.read() (in iterateReader) overwrites the previous output)

In the below example, iterable yields [1], [2] but it prints [2], [2] (because of 10ms delay before printing it).

import { iterateReader } from "https://deno.land/[email protected]/streams/conversion.ts";

const data = [new Uint8Array([1]), new Uint8Array([2])];
const reader = {
  async read(u8: Uint8Array) {
    const d = data.shift();
    if (d) {
      u8.set(d);
      return 1;
    }
    return null;
  }
}

const iter = iterateReader(reader);

for await (const d of iter) {
  setTimeout(() => {
    console.log(d);
  }, 10);
}

I have no idea what Deno v1.20.6 did differently from 1.21.0 that it did not suffer from this issue,

My guess for this is that the bug always existed but #14319 made the file reading faster and revealed the bug as a result.

kt3k avatar Oct 03 '22 05:10 kt3k

The bug itself seems having existed from the beginning https://github.com/denoland/deno/pull/1130/files#diff-1f62e23971d4281e275c4e80b39ed60ac6f0b71b14bba6d92e4decf790ad53e1R129

(Note:

  • iterateReader was renamed from iter in https://github.com/denoland/deno_std/pull/813
  • iter in std was migrated from Deno.iter in https://github.com/denoland/deno_std/pull/843
  • Deno.iter was renamed from Deno.toAsyncIterator in https://github.com/denoland/deno/pull/4848

kt3k avatar Oct 03 '22 06:10 kt3k