protobuf-es icon indicating copy to clipboard operation
protobuf-es copied to clipboard

Use Uint8Array to/from base64 once it's widely available

Open timostamm opened this issue 5 months ago • 0 comments

Protobuf-ES exports the functions base64Decode and base64Encode from @bufbuild/protobuf/wire. The functions are used when JSON serializing / parsing a Protobuf bytes field. base64Decode is used at module init to hydrate descriptors in generated code.

Native support for base64 encoding with Uint8Array is underway in https://github.com/tc39/proposal-arraybuffer-base64.

A quick look at performance in Chrome 140 show a significant improvement for payloads larger than 100 bytes:

encode.ts
import {base64Encode} from "@bufbuild/protobuf/wire";

const amount = 10_000;
const len = 1000;
const type: "base64Encode" | "Uint8Array.toBase64" = "Uint8Array.toBase64";

const datas: Uint8Array[] = [];
while (datas.length < amount) {
  const data = new Uint8Array(len)
  crypto.getRandomValues(data);
  datas.push(data);
}

let fn: (bytes: Uint8Array) => string;
switch (type) {
  case "base64Encode":
    fn = base64Encode;
    break;
  case "Uint8Array.toBase64":
    fn = (bytes: Uint8Array) => {
      // @ts-expect-error
      return bytes.toBase64(bytes, {
        alphabet: "base64",
      });
    };
    break;
}
// warm up
for (const bytes of datas) {
  if (fn(bytes).length === 0) {
    throw new Error(`Produced empty base64`);
  }
}
const start = performance.now();
for (const bytes of datas) {
  if (fn(bytes).length === 0) {
    throw new Error(`Produced empty base64`);
  }
}
const elapsed = performance.now() - start;
console.log(`${type} ${amount} chunks with ${len} bytes: ${elapsed} ms`);
base64Encode 1000000 chunks with 10 bytes: 110.60000002384186 ms
Uint8Array.toBase64 1000000 chunks with 10 bytes: 130.39999997615814 ms

base64Encode 1000000 chunks with 20 bytes: 165.39999997615814 ms
Uint8Array.toBase64 1000000 chunks with 20 bytes: 138.60000002384186 ms

base64Encode 100000 chunks with 100 bytes: 61.39999997615814 ms
Uint8Array.toBase64 100000 chunks with 100 bytes: 37.200000047683716 ms

base64Encode 10000 chunks with 1000 bytes: 50.5 ms
Uint8Array.toBase64 10000 chunks with 1000 bytes: 2.100000023841858 ms
decode.ts
import {base64Decode, base64Encode} from "@bufbuild/protobuf/wire";

const amount = 1_000_000;
const len = 20;
const type: "base64Decode" | "Uint8Array.fromBase64" = "Uint8Array";

const datas: string[] = [];
while (datas.length < amount) {
  const data = new Uint8Array(len)
  crypto.getRandomValues(data);
  datas.push(base64Encode(data));
}

let fn: (str: string) => Uint8Array;
switch (type) {
  case "base64Decode":
    fn = base64Decode;
    break;
  case "Uint8Array.fromBase64":
    fn = (str: string) => {
      // @ts-expect-error
      return Uint8Array.fromBase64(str, {
        alphabet: "base64",
      });
    };
    break;
}
// warm up
for (const b64 of datas) {
  if (fn(b64).length === 0) {
    throw new Error(`Produced empty bytes`);
  }
}
const start = performance.now();
for (const bytes of datas) {
  if (fn(bytes).length === 0) {
    throw new Error(`Produced empty bytes`);
  }
}
const elapsed = performance.now() - start;
console.log(`${type} ${amount} chunks with ${len} bytes: ${elapsed} ms`);
base64Decode 1000000 chunks with 10 bytes: 412.2999999523163 ms
Uint8Array.fromBase64 1000000 chunks with 10 bytes: 447.2999999523163 ms

base64Decode 1000000 chunks with 20 bytes: 442 ms
Uint8Array.fromBase64 1000000 chunks with 20 bytes: 423.2999999523163 ms

base64Decode 100000 chunks with 100 bytes: 84.70000004768372 ms
Uint8Array.fromBase64 100000 chunks with 100 bytes: 46 ms

base64Decode 10000 chunks with 1000 bytes: 48.799999952316284 ms
Uint8Array.fromBase64 10000 chunks with 1000 bytes: 6 ms

Uint8Array.prototype.toBase64() supports the standard base64 alphabet, the URL-safe alphabet, and has an option to omit padding. These options allow for the same behaviors as our base64Encode, and will be trivial to adopt.

While Uint8Array.prototype.fromBase64() supports the standard and the URL-safe alphabet, it's less lenient than our base64Decode. It does not allow inner padding, and requires to specify the alphabet. Adopting this API for base64Decode will be a breaking change.

The new API is not widely available yet. At this point, it isn't implemented in Node.js or Deno.

Related to: https://github.com/bufbuild/protobuf-es/issues/333

timostamm avatar Sep 30 '25 13:09 timostamm