Build crashing when metafile size > 512MB
👋 Kudos for this awesome build tool!
In a nutshell
When bundling very large projects with the metafile: true flag the build crashes with the following error.
Error: Cannot create a string longer than 0x1fffffe8 characters
at TextDecoder.decode (node:internal/encoding:447:16)
at decodeUTF8 (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:188:35)
at visit (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:99:16)
at visit (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:114:43)
at decodePacket (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:126:15)
at handleIncomingPacket (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:651:18)
at Socket.readFromStdout (/Users/xxx/dev/web-ui/.yarn/cache/esbuild-npm-0.25.10-f26f7be387-a8e4d33d7e.zip/node_modules/esbuild/lib/main.js:581:7)
at Socket.emit (node:events:524:28)
at Socket.emit (node:domain:489:12)
at addChunk (node:internal/streams/readable:561:12) {
code: 'ERR_STRING_TOO_LONG'
Context
We are currently using esbuild in our company to build the application as part of a huge monorepo (>10M LOC).
The metafile flag is a requirement for us as we use specific build plugins to process the data.
The codebase is constantly growing and we faced the error recently.
From what I could gather this is caused by:
- The maximum size for a string in the v8 engine is 512MB (defined here)
- The node process is storing the
metafilevalue into a string before callingJSON.parse()- exceeding the threshold
I also noticed that the JSON was not minified, so as a (very dirty) workaround and to buy us some time I patched lib/main.js to chunk the data and minify the JSON on the fly to reduce the final string length.
(like this)
diff --git a/lib/main.js b/lib/main.js
index 0f61c81621ded9262d532307857e252673c76473..db4b1b9f0cd8d2c74fa380630c7aff8b666859ee 100644
--- a/lib/main.js
+++ b/lib/main.js
@@ -178,6 +178,59 @@ var ByteBuffer = class {
return bytes;
}
};
+
+// [BEGIN PATCH]
+const TEMP_BUFFER_WINDOW = 100_000;
+const decodeWithFallback = (decodeFn) => (bytes) => {
+ try {
+ // Attempt to decode the bytes info an UTF8 string
+ return decodeFn(bytes);
+ } catch (error) {
+ // Ouch, it failed :(
+ // This likely means that the bytes array is too big and won't fit into
+ // a single node.js (v8) string as it exceeds the 512MB limit
+ const { buffer, byteOffset, byteLength } = bytes;
+ const buf = Buffer.from(buffer, byteOffset, byteLength);
+
+ const now = performance.now();
+ const tmpFolder = require("node:os").tmpdir();
+ const filePath = require("node:path").join(
+ tmpFolder,
+ `esbuild-packet-${now}.json`
+ );
+
+ console.log(`[!] Overweight esbuild JSON message (${byteLength} bytes)`);
+ console.log(` Attempting to minify… (using temporary file: ${filePath})`);
+
+ const fd = fs.openSync(filePath, "w");
+
+ // Now, we know that these string are representing JSON so we can read part of the message
+ // and "minify it" piece by piece (it comes prettified from the go side with a lot of unnecessary white space)
+ try {
+ let offset = 0;
+ while (true) {
+ const tempStr = buf
+ .slice(offset, offset + TEMP_BUFFER_WINDOW)
+ .toString()
+ .replaceAll(/\s*[\r\n]\s*/g, "")
+ .replaceAll(/"([^"])":\s"/g, '"$1":"');
+ fs.writeFileSync(fd, tempStr);
+ if (offset >= buf.length) {
+ break;
+ }
+ offset = offset + TEMP_BUFFER_WINDOW;
+ }
+
+ console.log(` Done minifying, final size: ${fs.statSync(filePath).size} bytes`);
+
+ return fs.readFileSync(filePath, "utf-8");
+ } finally {
+ fs.closeSync(fd);
+ }
+ }
+};
+// [END PATCH]
+
var encodeUTF8;
var decodeUTF8;
var encodeInvariant;
@@ -185,14 +238,16 @@ if (typeof TextEncoder !== "undefined" && typeof TextDecoder !== "undefined") {
let encoder = new TextEncoder();
let decoder = new TextDecoder();
encodeUTF8 = (text) => encoder.encode(text);
- decodeUTF8 = (bytes) => decoder.decode(bytes);
+ // [PATCHED]
+ decodeUTF8 = decodeWithFallback((bytes) => decoder.decode(bytes));
encodeInvariant = 'new TextEncoder().encode("")';
} else if (typeof Buffer !== "undefined") {
encodeUTF8 = (text) => Buffer.from(text);
- decodeUTF8 = (bytes) => {
+ // [PATCHED]
+ decodeUTF8 = decodeWithFallback((bytes) => {
let { buffer, byteOffset, byteLength } = bytes;
return Buffer.from(buffer, byteOffset, byteLength).toString();
- };
+ });
encodeInvariant = 'Buffer.from("")';
} else {
throw new Error("No UTF-8 codec found");
Now this is brittle - and I'm not sure how long this will hold - so I'm wondering:
- is there any reason to not minify from the go side before sending?
- do you think that it would be possible to stream the JSON and instantiate the js metafile object using a pull parser instead of calling
JSON.parse() - or if you had any other thoughts on the subject
Thanks! 🙇
I've taken a look at this. There are various points in the build process where the JSON metafile could be compacted or encoded differently (the esbuild IPC protocol supports encoding for nested objects). But I'm not happy with either approach as it will require either
-
parsing the entire nested structure on every build, when using the IPC encoding for avoiding large strings
-
parsing/printing the entire nested structure on every build, when detecting a large metafile and compacting it at the IPC layer using
json.Compactfrom the standard-library -
parsing/printing individual chunks, when leaving the building as-is and only compact at the level where the bundler concatenates the chunks, this would allow some level of caching on the
bundler.scannerFileinput struct (these structs are cached), but not ongraph.OutputFileresult structs that are rebuilt by the linker. -
invasive changes to the linker/bundler for changing the building of the individual chunks to not add spaces/new-lines for pretty-printing, e.g. the following could be changed to https://github.com/evanw/esbuild/blob/6b2ee78d7f273d7ed4c4bb08b516939b373bcd67/internal/linker/linker.go#L5806
jMeta.AddString(helpers.FormatJSONChunk(c.options.CompactMetafile, " \"entryPoint\": %s,\n", "\"entryPoint\":%s,\n", helpers.QuoteForJSON(entryPoint, c.options.ASCIIOnly)))Where
helpers.FormatJSONChunk(compact bool, format, formatCompact string, a ...any) stringpicks either format based on the flagc.options.CompactMetafilejMeta.AddString(helpers.FormatJSONChunk(c.options.CompactMetafile, " \"entryPoint\": %s,\n", helpers.QuoteForJSON(entryPoint, c.options.ASCIIOnly)))Alternatively with a single
formatargument, of which we remove spaces based on the flag. That could result in unknown performance impacts from repeatedly processing the format strings. The code is also much harder to read.
Stepping back, do we need to produce a pretty-printed metafile in the first place? Could we instead build a compact JSON blob instead and leave pretty-printing to the user? The tests could use json.Indent from the standard-library for easier debugging of metafile snapshots.
I've opened https://github.com/evanw/esbuild/pull/4349 with the changes for generating a compact JSON blob.
@elbywan can you give this a try with your large project?