Docusaurus leaks memory on i18n site builds
When building a Docusaurus site with multiple locales, the memory keeps increasing.
What happens, in pseudo-code:
async function buildSite() {
await buildLocale("en"); // +20mb
await buildLocale("fr"); // +20mb
await buildLocale("ja"); // +20mb
await buildLocale("es"); // +20mb
}
This can be seen on our website, but also our init template (much smaller leak though).
In https://github.com/facebook/docusaurus/pull/10599 we solved an important leak, but we still leak memory and should investigate so that it's possible to build a Docusaurus site in thousands of locales without having to increase the heap size.
Repro
yarn install
yarn clear:website
NODE_OPTIONS="--max-old-space-size=250 --expose-gc" DOCUSAURUS_PERF_LOGGER=true yarn build:website:fast --locale en --locale fr --locale ja --locale es
The logs show memory increasing after each locale despite calling globalThis.gc?.() before measures.
[PERF] Build > en - 20.90 seconds! - (58mb -> 144mb)
[PERF] Build > fr - 30.94 seconds! - (144mb -> 162mb)
[PERF] Build > ja - 32.49 seconds! - (162mb -> 180mb)
[PERF] Build > es - 33.00 seconds! - (180mb -> 197mb)
[PERF] Build - 117.36 seconds! - (58mb -> 197mb)
Heap dumps
Heap dumps can be taken after each locale build with little edits in packages/docusaurus/src/commands/build/build.ts
async function runBuildLocaleTask(params: BuildLocaleParams) {
await buildLocale(params);
globalThis.gc?.();
require('v8').writeHeapSnapshot(
`docusaurus-heap-${Date.now()}-${params.locale}.heapsnapshot`,
);
}
Access upon request: https://drive.google.com/drive/folders/11JrI_sgw_uwtGBAza52AEyVpiH33Qyq8?usp=sharing
Edit: from this discussion, we can assume that the memory leak grows with the number of MDX docs: https://github.com/facebook/docusaurus/discussions/9211#discussioncomment-12337742
Edit: these articles presents interesting ways to run memory leak tests:
- https://joyeecheung.github.io/blog/2024/03/17/memory-leak-testing-v8-node-js-1/
- https://joyeecheung.github.io/blog/2024/03/17/memory-leak-testing-v8-node-js-2/
- https://joyeecheung.github.io/blog/2024/03/17/memory-leak-testing-v8-node-js-3/
Some investigations I did recently:
gray-matter cache
While investigating build times on a large site (discussion), I noticed that the legacy lib we use to parse front matter has a built-in cache:
https://github.com/jonschlinkert/gray-matter/blob/master/index.js#L225
For that large 11k docs site, it looks like the cache retains approximately 200mb of memory, which could become problematic for i18n sites.
We should probably call require("gray-matter").clearCache() somewhere in our process.
Note: we can disabe the gray-matter cache by providing an empty option {}, but from my benchmarks, this decreases performance a bit and it's faster with the cache (AB=false):
Benchmark 1: DOCUSAURUS_AB_BENCHMARK=true yarn build:website:fast
Time (mean ± σ): 11.094 s ± 0.462 s [User: 30.461 s, System: 5.889 s]
Range (min … max): 10.593 s … 11.503 s 3 runs
Benchmark 2: DOCUSAURUS_AB_BENCHMARK=false yarn build:website:fast
Time (mean ± σ): 11.026 s ± 0.643 s [User: 30.620 s, System: 5.993 s]
Range (min … max): 10.337 s … 11.608 s 3 runs
Summary
DOCUSAURUS_AB_BENCHMARK=false yarn build:website:fast ran
1.01 ± 0.07 times faster than DOCUSAURUS_AB_BENCHMARK=true yarn build:website:fast
This makes sense because versioned docs sites have many docs that have the exact same front matter input. We want to keep the cache, but ideally we should implement our own on top of gray-matter using WeakMap instead of {}.
MDX processor cache
Similarly, we have a cache of MDX processors that we never clear:
const ProcessorsCache = new Map<string | Options, SimpleProcessors>();
We should probably clear it after building each locale, but I'm not sure it's what's causing the most important leaks we have. However this might affect some third-party plugins because things like remark plugins can be stateful.
Memory leaks increase with Rspack
On a newly initialized Docusaurus v3.7 site with Docusaurus faster, resolving to the latest Rspack (1.3.9 currently)
const config = {
future: {
experimental_faster: {
mdxCrossCompilerCache: false,
rspackBundler: true, // Flag to toggle
swcJsLoader: true,
swcJsMinimizer: true,
lightningCssMinimizer: true,
swcHtmlMinimizer: true,
},
},
}
When building with:
yarn clear && NODE_OPTIONS="--max-old-space-size=400 --expose-gc" DOCUSAURUS_PERF_LOGGER=true yarn build --locale en --locale fr --locale ja --locale es
When Webpack, we have a high memory spike while bundling, but the memory leaks are tiny:
[PERF] Build > en - 10.97 seconds! - (50mb -> 136mb)
[PERF] Build > fr - 11.62 seconds! - (136mb -> 136mb)
[PERF] Build > ja - 10.96 seconds! - (136mb -> 138mb)
[PERF] Build > es - 11.50 seconds! - (138mb -> 139mb)
When Rspack is enabled, the memory leaks become larger:
[PERF] Build > en - 3.22 seconds! - (60mb -> 147mb)
[PERF] Build > fr - 3.49 seconds! - (147mb -> 188mb)
[PERF] Build > ja - 3.88 seconds! - (188mb -> 227mb)
[PERF] Build > es - 7.77 seconds! - (227mb -> 265mb)
When compiling the very same app (English) twice in a row, we see a memory increase.
From comparing heap dumps before/after the additional compilation, we see a delta causing by strings used by Rspack NormalModules.
I would expect that after closing the Rspack compiler, all its things such as NormalModule can be garbage collected, but that doesn't look to be the case. I think the Rspack Compiler and everything it references stay in memory. Not sure why and if it's related to Rspack or Docusaurus.
(note: I tested that on a very minimal Docusaurus site, with only a page plugin, and without persistent cache or Rspack incremental mode)
This retained memory seems to be related to Rspack COMPILATION_WEAK_MAP
https://github.com/web-infra-dev/rspack/blob/v1.3.9/packages/rspack/src/Compiler.ts#L80
Unfortunately, I'm not sure my explicit globalThis.gc?.(); calls clears it properly, and V8 may choose to retain the memory. But maybe it would clean under more memory pressure 🤷♂
Anyway, disabling this feature manually by tweaking node modules, we can see that Rpsack doesn't produce a memory delta anymore on new compilation, even though it seems the last compiler remains in memory.
We are currently unable to upgrade from Docusaurus 3.6.3 due to this issue. On our website each locale compilation with rspack takes around 300mb more. We have around 6500 files per language.
@guillaume-kotulski, you should be able to run multiple times the docusaurus build --locale <my-locale> in a row to mitigate this issue (it might require some little config adjustments)
This only happens if you use docusaurus build, which just loops over all your locales to build them sequentially.