gatsby icon indicating copy to clipboard operation
gatsby copied to clipboard

Heap memory usage increase during development builds after upgrading to v4.24.5

Open skatingincentralpark opened this issue 2 years ago • 39 comments

Preliminary Checks

  • [X] This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/gatsbyjs/gatsby/issues
  • [X] This issue is not a question, feature request, RFC, or anything other than a bug report directly related to Gatsby. Please post those things in GitHub Discussions: https://github.com/gatsbyjs/gatsby/discussions

Description

We have a large app in a monorepo using Gatsby v4.15.2. Upon upgrading to v4.24.5, and running gatsby develop causes the build to crash during Building development bundle with a Javascript heap out of memory error. See error snapshot below:

Error Snapshot - Gatsby v4.24.5 development build

<--- Last few GCs --->

[24620:0x160008000]    62793 ms: Mark-sweep (reduce) 3984.9 (4062.5) -> 3984.8 (4063.5) MB, 123.5 / 0.0 ms  (average mu = 0.191, current mu = 0.000) last resort GC in old space requested
[24620:0x160008000]    62929 ms: Mark-sweep (reduce) 3984.8 (4062.5) -> 3984.7 (4063.5) MB, 135.6 / 0.0 ms  (average mu = 0.098, current mu = 0.000) last resort GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0x100980a10 node::Abort() [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 2: 0x100980b74 node::OnFatalError(char const*, char const*) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 3: 0x100a999c4 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 4: 0x100a99958 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 5: 0x100c21f8c v8::internal::Heap::CollectionBarrier::Wait() [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 6: 0x100c2a71c v8::internal::Heap::SetUp() [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 7: 0x100bf9bb4 v8::internal::FactoryBase<v8::internal::Factory>::NewRawTwoByteString(int, v8::internal::AllocationType) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 8: 0x100e42e8c v8::internal::String::SlowFlatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ConsString>, v8::internal::AllocationType) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
 9: 0x100aa7138 v8::internal::String::Flatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::String>, v8::internal::AllocationType) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
10: 0x100ab6c1c v8::String::Utf8Length(v8::Isolate*) const [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
11: 0x100961f58 node::Buffer::(anonymous namespace)::ByteLengthUtf8(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
12: 0x100b030d8 v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
13: 0x100b026d0 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
14: 0x100b01f54 v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
15: 0x10122346c Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit [/Users/charleszhao/.nvm/versions/node/v14.17.6/bin/node]
16: 0x105b73080 
17: 0x105b7a928 

Comparison between 'v4.15.2', and 'v4.24.5' dev build memory consumption

For this test, we did the following:

  1. Increased “heap size” allocated for memory by adding NODE_OPTIONS=--max-old-space-size=8192 to our dev script
  2. Installed process-top` to profile the memory usage

Our Gatsby 'v4.24.5' build was now successful due to the increase in allocated heap memory from step (2) above. Also, we noticed an increase of ~2GB when comparing heap memory usage for Gatsby v4.15.2, and Gatsby v4.24.5, see table below:

Gatsby Version Heap Usage
Gatsby v4.15.2 2.5 GB / 2.6 GB
Gatsby v4.24.5 4.5 GB / 4.7 GB

Reproduction Link

https://github.com/skatingincentralpark/gatsby-v4-test

Steps to Reproduce

  1. git clone https://github.com/skatingincentralpark/gatsby-v4-test.git
  2. cd to the project folder /gatsby-v4-test
  3. Run git switch 4.15.2-no-packages to switch to the branch with Gatsby v4.15.2
  4. Run yarn to install dependencies
  5. Run yarn dev to start development server
  6. Notice the highest heap will be around 623 MB / 657 MB
  7. Run yarn clean to clear Gatsby cache
  8. Run git switch 4.24.5-no-packages to switch to the branch with Gatsby v4.24.5
  9. Run yarn to install dependencies
  10. Run yarn dev to start development server
  11. Notice the highest heap will be around 1.5 GB / 1.6 GB

Expected Result

Expected the heap usage to be similar after upgrading to v4.24.5 from v4.15.2.

Actual Result

Heap usage increased by 1GB. This increased memory usage in Gatbsy v4.24 causes a 'Javascript heap out of memory' error if NODE_OPTIONS=--max-old-space-size=8192 is not added to our dev script.

  • v4.15.2: heap: 623 MB / 657 MB
cpu: 139.4% | rss: 898 MB (5.2%) | heap: 623 MB / 657 MB (94.9%) | ext: 336 MB | delay: 882 ms | 00:00:16 | 
  • v4.24.5: heap: 1.5 GB / 1.6 GB
cpu: 130.1% | rss: 1.6 GB (9.4%) | heap: 1.5 GB / 1.6 GB (97.7%) | ext: 488 MB | delay: 891 ms | 00:00:27 | 

Environment

System:
    OS: macOS 11.6
    CPU: (8) arm64 Apple M1
    Shell: 5.8 - /bin/zsh
  Binaries:
    Node: 14.17.6 - ~/.nvm/versions/node/v14.17.6/bin/node
    Yarn: 1.22.18 - ~/.nvm/versions/node/v14.17.6/bin/yarn
    npm: 6.14.15 - ~/.nvm/versions/node/v14.17.6/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  Browsers:
    Chrome: 106.0.5249.119
    Firefox: 104.0.2
    Safari: 15.0
  npmGlobalPackages:
    gatsby-cli: 4.10.1

Config Flags

No response

skatingincentralpark avatar Oct 27 '22 00:10 skatingincentralpark

Hi, thanks for the issue!

Also thanks for the detailed issue. I suppose you updated your app from 4.15 to 4.24 without any incremental updates right? Because the version range is too large to be able to find the culprit. What would need to be done is bisect the releases, so e.g.

  • Base: 4.15
  • Update to 4.16, test
  • Update to 4.17, test
  • etc.

If then e.g. 4.17 would have it, double check that 4.16 is ok. Then one could be sure that the 4.17 release has this regression. If you have the time, we'd much appreciate it if you (or anyone else) could help with this.

LekoArts avatar Oct 27 '22 11:10 LekoArts

Hey @LekoArts, here are the results from the investigation (credits: @odunet ⭐️)! We've also recorded the PartyTown Context error versions, but that can be ignored as the latest versions have implemented the fix.

Gatsby versions investigation results 'v4.15.0 - 'v4.24.4'

Gatsby Version Webpack 'pack' folder size OOM Error? (Out Of Memory) PartyTown Error? Comment
Gatsby v4.15.4 10.05GB NO NO Version usable with no errors
Gatsby v4.16.0 10.05GB NO YES Version NOT usable with PartyTown Context error
Gatsby v4.17.0 10.05GB NO YES Version NOT usable with PartyTown Context error
Gatsby v4.17.2 10.05GB NO YES Version NOT usable with PartyTown Context error
Gatsby v4.18.0 10.03GB NO YES Version NOT usable with PartyTown Context error
Gatsby v4.18.2 10.05GB NO YES Version NOT usable with PartyTown Context error
Gatsby v4.19.0 19.69GB YES YES Version NOT usable with PartyTown Context error, and OOM error
Gatsby v4.20.0 19.69GB YES NO version has OOM error, but usable with NODE_OPTIONS memory fix
Gatsby v4.21.0 19.69GB YES NO version has OOM error, but usable with NODE_OPTIONS memory fix
Gatsby v4.22.0 19.45GB YES NO version has OOM error, but usable with NODE_OPTIONS memory fix
Gatsby v4.23.0 19.69GB YES NO version has OOM error, but usable with NODE_OPTIONS memory fix
Gatsby v4.24.0 19.69GB YES NO version has OOM error, but usable with NODE_OPTIONS memory fix

As shown above, Gatsby v4.16.0 in commit https://github.com/gatsbyjs/gatsby/commit/a88703f4de47c0ba9db48914bc2e0df73440dc92 introduced a PartyTown Context bug, this bug rendered versions v4.16.0 - v4.19.0 unusable for us. This was later fixed in Gatsby v4.20.0 in commit https://github.com/gatsbyjs/gatsby/commit/49cf094380bcf69d9239f8abbbd4db9c1968dcf8.

It can also be deduced from above that the OOM (Out of memory) error was introduced in Gatsby v4.19.0, the OOM error was accompanied by an increase of approximately 9GB in the webpack 'pack' file size.

Thank you ✊

skatingincentralpark avatar Oct 27 '22 22:10 skatingincentralpark

Awesome, thank you for help on this! We'll have a closer look at this next week 👍

LekoArts avatar Oct 28 '22 07:10 LekoArts

Any updates on this? Seems like gatsby 5 is also affected

Can confirm it works well with 4.15.2

dacevedo12 avatar Nov 11 '22 17:11 dacevedo12

I've re-ran your reproduction again and I don't see the behavior described anymore. I think I saw it when I initially tried it but that could have been just a coincidance.

4.15.2:

image

4.24.5

image

As you can see the heap is really similar. And this also makes sense. Node.js will use all the available memory and only then garbage collect.

We then tried to force garbage collection and also only saw 350 MB vs 400MB. You can try that for yourself:

// in your gatsby-node.js

// ...rest of imports

const v8 = require(`v8`)
const vm = require(`vm`)
v8.setFlagsFromString(`--expose_gc`)
const gc = vm.runInNewContext(`gc`)

exports.createPages = async ({ graphql, actions, reporter }) => {
  setInterval(() => {
    // Prints out a string containing stats about your Node.js process.
    gc()
    console.log(top.toString())
  }, 5000)

  // ... rest of files

So the important question is: Did you get out of memory errors? Because if not then this behavior isn't too unusual.

LekoArts avatar Nov 14 '22 09:11 LekoArts

Hey @LekoArts, thank you very much for your response. I just re-ran the reproduction (following the steps outlined in the issue) on two computers multiple times and am still seeing the behaviour described in the issue (which differs from your results).

In answer to your question "Did you get out of memory errors?": Yes we do get OOM errors in our main application of the same structure (since it is much larger in size). Which prevents us from starting our dev server.

  • In our main application, adding NODE_OPTIONS=--max-old-space-size=8192 allows the server to start, but memory usage is extremely high when developing, while shadowing files and refreshing endpoints, causing our computers to choke heavily.
  • It got so bad we downgraded back to v4.15.2 until this is sorted.
  • Our worry is we'll be stuck on v4.15.2 and unable to benefit from features in new releases which is commercially untenable as we will shortly have hundreds of sites live on Gatsby Cloud.

I've created a reproduction of the OOM error in a new branch. Instructions are at the bottom of the comment

Your heap usage with forced garbage collection seems to align with my results.

Questions

  1. Could you please run the initial test again to confirm?
  2. Is forced garbage collection a proposed fix for this issue experienced?

Summary of results

Macbook Pro 13" Macbook Pro 14" Macbook Pro 13" - Force Garbage Collection
v4.15.2 Heap 605MB / 637MB 667MB / 710MB 318MB / 498MB
v4.24.5 Heap 1.5GB / 1.6GB 1.3GB / 1.4GB 379MB / 559MB

Screenshots of results and System Info

Macbook Pro 13"

macbook pro 13 - 4 15 2 macbook pro 13 - 4 24 5

System:
    OS: macOS 11.6
    CPU: (8) arm64 Apple M1
    Shell: 5.8 - /bin/zsh
  Binaries:
    Node: 14.17.6 - ~/.nvm/versions/node/v14.17.6/bin/node
    Yarn: 1.22.18 - ~/.nvm/versions/node/v14.17.6/bin/yarn
    npm: 6.14.15 - ~/.nvm/versions/node/v14.17.6/bin/npm
  Languages:
    Python: 2.7.16 - /usr/bin/python
  Browsers:
    Chrome: 106.0.5249.119
    Firefox: 104.0.2
    Safari: 15.0
  npmGlobalPackages:
    gatsby-cli: 4.10.1

Macbook Pro 14"

macbook pro 14 - 4 15 2 macbook pro 14 - 4 24 5
  System:
    OS: macOS 12.1
    CPU: (10) arm64 Apple M1 Pro
    Shell: 5.8 - /bin/zsh
  Binaries:
    Node: 18.6.0 - ~/.nvm/versions/node/v18.6.0/bin/node
    Yarn: 1.22.19 - ~/.nvm/versions/node/v18.6.0/bin/yarn
    npm: 8.13.2 - ~/.nvm/versions/node/v18.6.0/bin/npm
  Languages:
    Python: 2.7.18 - /usr/bin/python
  Browsers:
    Chrome: 107.0.5304.110
    Safari: 15.2
  npmGlobalPackages:
    gatsby-cli: 4.24.0

Macbook Pro 13" - Garbage Collected

4 15 2-force-garbage-collection-1 4 24 5-force-garbage-collection-1

Steps to produce OOM error

I have created a branch on the reproduction repo that will have the OOM error. Below are the steps.

  1. Run git clone https://github.com/skatingincentralpark/gatsby-v4-test.git
  2. Cd to directory cd gatsby-v4-test/
  3. Run git switch 4.24.5-all-packages to switch to the branch with Gatsby v4.24.5 and various packages installed and initialised
  4. Run yarn to install dependencies
  5. Run yarn dev to start development server
  6. OOM should appear

skatingincentralpark avatar Nov 17 '22 12:11 skatingincentralpark

We started running into OOM issues as well, I will also post some of the testing we did in a bit.

@skatingincentralpark maybe a workaround for you (or something to test/benchmark as well) is with the DEV_SSR flag enabled. For us it made it possible to just start the dev server again.

pepijn-vanvlaanderen avatar Nov 17 '22 13:11 pepijn-vanvlaanderen

@pepijn-vanvlaanderen

Awesome, DEV_SSR does seem to work. Although, it would be hard for us to use, since it seems to make using shadowing files quite unpredictable 🥲.

Handy to have in the back pocket though.

skatingincentralpark avatar Nov 17 '22 23:11 skatingincentralpark

the heap memory issue remained in gatsby5, in my experience with gatsby5, mdx2 and newest versions of gatsby_* packages, the report is as below

the scope: 1.less than 1000 mdx pages 2.using sharp and react svg for images

the raised problems:

  1. development server with 12 gb of ram always raises heap allocation error with less than 1000 mdx pages
  2. build takes at least 20 gb of ram
  3. after serve, the images in pages take too much time to load in comparison with gatsby 4.15, at least 5 second!
  4. the pages js bundle raised too much, that makes website to load after 5 second, in comparison with gatsby 4.15 that was taking less than 500 milisecond for same code.

I hope this comment be helpful, for those who wants to upgrade to reconsider their strategy, Also I hope that development community revise the building process of gatsby5, webpack and react 18 together

mjBayati avatar Nov 18 '22 10:11 mjBayati

I have a problem with memory leak for a long time. I wait a fix and don't update my projects from 4.14.1 to newest versions.

macsmel avatar Nov 29 '22 17:11 macsmel

I have this issue too after updating from Gatsby 4.14.1 to Gatsby 5 (and to react-plugin-mdx). My site is pretty big, with around 1,000 pages and 17,000 images. None of the workarounds work and I can't build my site.

I have tried adjusting the heap limit to 16gb with NODE_OPTIONS=--max-old-space-size=16384 (I have 32GB of system memory). gatsby develop works if I set the DEV_SSR flag, but gatsby build always runs out of memory at the Building HTML renderer phase with:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

edit: Managed to get my build to finish by closing all background apps and raising the --max-old-space-size to 24gb (at its peak it used 19gb during build). This also helps gatsby develop to not crash as frequently. Overall memory usage is now 27/32gb when running gatsby develop, so this obviously isn't a practical solution for the majority of people who will have 16GB of RAM or less, and attempting to load certain pages consumes all system memory and crashes it.

Sidenote: My builds also seem to take A LOT longer now than before. On Gatsby 4 the first build took ages, but subsequent builds took less (like maybe half an hour, and most of that was generating images). On Gatsby 5 my first successful build took around 5 hours, and subsequent builds also took 5 hours. It was processing images it had already processed the first time. It seems like caching is broken?

FraserThompson avatar Nov 30 '22 03:11 FraserThompson

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!

Thanks for being a part of the Gatsby community! 💪💜

github-actions[bot] avatar Dec 21 '22 00:12 github-actions[bot]

Any updates?

macsmel avatar Dec 21 '22 09:12 macsmel

Is there anything that community(we) can help with?, by upgrading, the issues with gatsby-plugin-mdx is solved. But now, we encounter heap memory issues and rendering issues while using gatsby-plugin-material-ui/

mjBayati avatar Dec 24 '22 10:12 mjBayati

Is there anything that community(we) can help with?, by upgrading, the issues with gatsby-plugin-mdx is solved. But now, we encounter heap memory issues and rendering issues while using gatsby-plugin-material-ui/

I don't use gatsby-plugin-mdx and gatsby-plugin-material-ui in my projects, but I still have heap memory issues.

macsmel avatar Dec 24 '22 12:12 macsmel

Hi, I commented before on this thread here and here. Now that i'm commenting, i can say that our issues with gatsby 5 are resolved.

the steps that we followed is as below:

  1. there was recompilation per mdx, which when tracing the bug we found out that it's important to put your import x.css or all css related stuffs at the end of your imports.

  2. it's so important to not concurrent build your template pages, for example your mdx's, because each of theme will be passed as context variable to template, which makes memory leak when you have accountable number of files to render.

  3. if you are using some styling system like gatsby-plugin-material-ui, which raises warning at compile time, it's so important to care about that warnings - for example in our case, using gatsby-plugin-material-ui added some hydration error while using dev_server, which with replacing it with it's newer verified version => gatsby-plugin-emotion, all errors has gone.

  4. there is an important fix in gatsby v5.3, which is related to esm based modules i think, if you aren't yet, please upgrade to latest.

  5. and the last and most important thing, please take a look at your gatsby-node, if you have different categories of files that should be compiled, separate them. and make their graphql query to run independently, and then create their related pages in some isolated context, it reduces your build time. - for example, if you have blog and knowledge-base, where can be build in different templates, separate their build scenarios

important note on running development server: Change your gatsby-node in the way that your mdx related files not be rendered in development mode, there is an error when running development server on mdx files, for our team, it increases the memory usage incrementally while changing files and recompiling. which makes memory heap error after a while.

mjBayati avatar Jan 13 '23 20:01 mjBayati

We haven't tried the updated reproduction since my last comment, but already back then we couldn't verify what you were seeing unfortunately. The issue is also now a mix between "just" increased memory usage and OOM. It's not clear to us what the actual problem is now.

Anyways, as per https://www.gatsbyjs.com/docs/reference/release-notes/gatsby-version-support/ I'm afraid that we can't invest a lot of time chasing this down in the 4.x release. For Gatsby 5 we did a lot of upgrades here and there, including https://github.com/gatsbyjs/gatsby/pull/37377 which updates webpack itself from a version from Oct 29, 2021 to the latest version.

So my asks are:

  • Please only run the reproductions with gatsby@next
  • Understand that this seems to be a "works on my machine"/"doesn't work on my machine" problem since we don't hear about this as a wide-spread problem
  • It's time consuming to verify this and any additional help is appreciated

Thanks!

LekoArts avatar Jan 17 '23 14:01 LekoArts

this is happening to us on gatsby 5.4.2

sibelius avatar Jan 18 '23 14:01 sibelius

I'm having this problem with version 5.4.2

Before making these customizations to gatsby-node.js, compilation proceeds normally:

    fallback: {
      fs: false,
      constants: false,
      http: require.resolve('stream-http'),
      https: require.resolve('https-browserify'),
      os: require.resolve('os-browserify/browser'),
      tty: require.resolve('tty-browserify'),
+    path: require.resolve('path-browserify'),
+    stream: require.resolve('stream-browserify'),
+    crypto: require.resolve('crypto-browserify'),
+    zlib: require.resolve('browserify-zlib'),
+    buffer: require.resolve('buffer/'),
    },
  };
+     plugins: [
+        new webpack.ProvidePlugin({
+          Buffer: ['buffer', 'Buffer'],
+        }),
+        new webpack.ProvidePlugin({
+          process: 'process/browser',
+        }),
+      ],

Now I'm having the same problem described in this issue. What could be happening?

samueldurantes avatar Jan 18 '23 15:01 samueldurantes

this is happening to us on gatsby 5.4.2

@sibelius Please understand that these types of comments are not helpful for us as they are not actionable. As mentioned in https://github.com/gatsbyjs/gatsby/issues/36899#issuecomment-1385544620 we weren't able to reproduce this issue yet. Also "this" is quite ambiguous, what do you mean with "this"?

We want to help y'all but we need a https://www.gatsbyjs.com/contributing/how-to-make-a-reproducible-test-case/, actionable comments (no "+1" comments) and more details.

LekoArts avatar Jan 24 '23 08:01 LekoArts

we got this "fixed" like this

NODE_OPTIONS=--max_old_space_size=4096 gatsby build

giving 4GB for gatsby build

sibelius avatar Jan 24 '23 12:01 sibelius

we got this "fixed" like this

NODE_OPTIONS=--max_old_space_size=4096 gatsby build

giving 4GB for gatsby build

Sorry, but it is not helpful. The build shouldn't consume 2.5gb-3gb in version 4.14.1 and 4-10gb in new versions.

macsmel avatar Jan 24 '23 12:01 macsmel

Note: This response comes from the team that created this issue.
@skatingincentralpark @McLeodSean @McLeodSean @odunet @pixelsoup cc @LekoArts


After lots of testing we were able to fix our issue and successfully upgrade to Gatsby v5.4.2 with the following changes.

Early Tests

  • Tests with Gatsby v5.3.2 were promising but we would still get the FATAL ERROR: ... JavaScript heap out of memory error when a site build exceeded 1,000 pages. This kind of aligns with this comment from @mjBayati and this from @FraserThompson
  • The heap memory error always occurred during the Building development bundle process. The heap memory would slowly climb then pop (fatal error) when memory exceeded 4.2GB - ish.
  • adding NODE_OPTIONS=--max-old-space-size=8192 to increase the upper limit of node memory would work OK with small sites (~100 pages), but it didn't work for us on v5.3.2 when we had a large number of pages (700+ or so). Increasing max-old-space-size is not an ideal fix anyway, but we would've taken it if it worked. 🤷‍♂️

Conclusion

  • Upgrading to Gatsby v5.4.2 + disabling CSS sourceMaps for our emotion plugin gatsby-plugin-emotion works (for us). Turning off source maps is mentioned here. <--- this is specifically referring to javaScript sourceMaps - not css - but the idea came from there.
  • Now, no memory issues occur even when when building a site with 1,800+ pages. 😅
  • Building the development bundle takes about 15 seconds longer (~75s now vs ~60s prev*) *depends on number of pages. This is bundle build time only. Our total build time ranges from 1:50 - 3:00 depending if it's a cold (yarn clean) or warm build.
  • We suspect a combination of changes (probably webpack) in Gatsby 5 and the nature of our application (lots of shadowed css.js files for theming - (see test repo) causes our emotion plugin gatsby-plugin-emotion to push the memory heap in local dev over 5GB triggering theJavaScript heap out of memory error. CSS sourceMaps were not an issue in earlier versions of Gatsby. As outlined above, our OOM (Out of memory) problems started in Gatsby v4.19.0.
  • Turning off sourceMaps stopped the memory heap failures. (we don't use sourceMaps to debug anyway 🤦🏼‍♂️).

❌ Before

  • Source maps enabled.
  • NODE_OPTIONS=--max-old-space-size=8192

The JavaScript heap out of memory error would trigger when memory exceeded 5GB even with NODE_OPTIONS=--max-old-space-size=8192 enabled. This happens for us with a site over 1,000 pages. This is what we'd see just before the error on a large site.

Screen Shot 2023-01-25 at 2 05 39 pm

✅ After

  • Emotion CSS source maps disabled

gatsby-config.js

    {
      resolve: 'gatsby-plugin-emotion',
      options: {
        sourceMap: false,
      },
    },

Instead of the memory heap climbing above 4GB it garbage collects at 4GB and dip back to a reasonable range.

Screen Shot 2023-01-25 at 11 14 34 am

pixelsoup avatar Jan 25 '23 03:01 pixelsoup

@pixelsoup

Using your repo I was able to narrow things down to introduction of Head as commit that did increase memory usage during webpack compilation. I will be diving into it more, as that was hefty change with lot of moving pieces, but now I at least have something concrete to work against.

Thanks!

pieh avatar Jan 30 '23 16:01 pieh

Hi, I commented before on this thread here and here. Now that i'm commenting, i can say that our issues with gatsby 5 are resolved.

the steps that we followed is as below:

  1. there was recompilation per mdx, which when tracing the bug we found out that it's important to put your import x.css or all css related stuffs at the end of your imports.
  2. it's so important to not concurrent build your template pages, for example your mdx's, because each of theme will be passed as context variable to template, which makes memory leak when you have accountable number of files to render.
  3. if you are using some styling system like gatsby-plugin-material-ui, which raises warning at compile time, it's so important to care about that warnings
    • for example in our case, using gatsby-plugin-material-ui added some hydration error while using dev_server, which with replacing it with it's newer verified version => gatsby-plugin-emotion, all errors has gone.
  4. there is an important fix in gatsby v5.3, which is related to esm based modules i think, if you aren't yet, please upgrade to latest.
  5. and the last and most important thing, please take a look at your gatsby-node, if you have different categories of files that should be compiled, separate them. and make their graphql query to run independently, and then create their related pages in some isolated context, it reduces your build time.
    • for example, if you have blog and knowledge-base, where can be build in different templates, separate their build scenarios

important note on running development server: Change your gatsby-node in the way that your mdx related files not be rendered in development mode, there is an error when running development server on mdx files, for our team, it increases the memory usage incrementally while changing files and recompiling. which makes memory heap error after a while.

@mjBayati I'm interested in several of the things you did here to mitigate your issues. This is the most unique advice I've seen on build performance with Gatsby, but some of the things you did are a little hard to determine how to replicate.

  1. You say you moved you css import, can you be more specific? What file is it in? Where was it importing from? Did the css change or did you just change the import order?
  2. How are you determining which templates are building concurrently? Did you go from sync to async node functions or did you determine some other form of control over concurrency?
  3. You say "separate" the queries. How do you mean? Are you writing multiple queries that need to be completed synchronously? I would love to see an example or reproduction, even if your example only contains example queries with no actual data, I'm interested in how your are structuring your file to achieve separation of concerns

Thanks!

panzacoder avatar Jan 31 '23 22:01 panzacoder

@pixelsoup thank you very much for the detailed reports and your conclusion. Your conclusions (disabling all source map generation, granting more memory and some other optimizations) led us to a 50-60% decrease now but I would not really call that a "fix" because the memory requirements are still significantly higher than before, it's just bringing it back into a state that works at all.

We are using gatsby 5.5, MDX v2 and emotion and are still struggling with peak memory issues in the build step. The issue likely came up with one of the 4.x releases in our case and got amplified by upgrading MDX to v2 now, causing this investigation. It's small sites in terms of number of pages (< 200) but a nontrivial gatsby usage.

Specifically, the issue is happening in the "Building HTML renderer" step of the build and it has the sneaky property of the RAM only spiking up for less than a second if run on a cached build. I'm forcing GC now, too before logging but it's clear that the memory report is not only uncollected thrash but actual memory assignments.
@LekoArts you mentioned that you had not been able to reproduce on a second run above, there may be a possibility that it was due to the sampling of process-top via setInterval not catching the issue at the right moment if it was a cached build?

I'm aware that by title this discussion is about issues in the development server, but I found that analyzing the problem via the build process is more predictable and better to debug.

Once we got our users productive again through a mix of throwing hardware at the problem and the above mentioned optimizations we'll try to get to a reproduction scenario on a current gatsby version. May take a bit though.

UPDATE: the build time issue turned out unrelated (but an actual issue rooted in the architectural changes in the MDX v2 plugin).

nkuehn avatar Feb 05 '23 17:02 nkuehn

Hey folks, I located a reason for increased memory usage (at least in repro provided by @pixelsoup ) and put up draft PR ( https://github.com/gatsbyjs/gatsby/pull/37619 ) to address it somewhat (i.e. if you don't use Head API, you should get back to pre 4.19 memory usage levels).

This allowed me to go from (heap max - heap: 3.0 GB):

cpu: 145.7% | rss: 669 MB (4.0%) | heap: 474 MB / 504 MB (94.1%) | ext: 142 MB | delay: 901 ms | 00:00:10 | loadavg: 1.56, 0.71, 0.44
cpu: 142.1% | rss: 732 MB (4.4%) | heap: 519 MB / 561 MB (92.4%) | ext: 144 MB | delay: 1080 ms | 00:00:11 | loadavg: 1.52, 0.71, 0.44
cpu: 163.8% | rss: 794 MB (4.8%) | heap: 588 MB / 621 MB (94.6%) | ext: 148 MB | delay: 1280 ms | 00:00:12 | loadavg: 1.52, 0.71, 0.44
cpu: 144.1% | rss: 1.9 GB (11.4%) | heap: 1.5 GB / 1.8 GB (85.5%) | ext: 148 MB | delay: 1015 ms | 00:00:19 | loadavg: 1.48, 0.72, 0.45
cpu: 157.9% | rss: 2.3 GB (14.1%) | heap: 1.6 GB / 1.9 GB (88.1%) | ext: 502 MB | delay: 885 ms | 00:00:20 | loadavg: 1.48, 0.72, 0.45
cpu: 165.8% | rss: 2.9 GB (17.5%) | heap: 1.9 GB / 2.2 GB (86.2%) | ext: 742 MB | delay: 879 ms | 00:00:21 | loadavg: 1.44, 0.72, 0.45
cpu: 166.4% | rss: 3.0 GB (18.3%) | heap: 2.0 GB / 2.1 GB (92.5%) | ext: 880 MB | delay: 827 ms | 00:00:22 | loadavg: 1.44, 0.72, 0.45
cpu: 228.2% | rss: 3.0 GB (18.3%) | heap: 2.0 GB / 2.1 GB (92.2%) | ext: 880 MB | delay: 403 ms | 00:00:23 | loadavg: 1.44, 0.72, 0.45
cpu: 210.2% | rss: 3.0 GB (18.3%) | heap: 2.0 GB / 2.1 GB (92.6%) | ext: 880 MB | delay: 237 ms | 00:00:24 | loadavg: 1.44, 0.72, 0.45
cpu: 198.1% | rss: 3.0 GB (18.3%) | heap: 2.0 GB / 2.1 GB (92.3%) | ext: 880 MB | delay: 69 ms | 00:00:25 | loadavg: 1.44, 0.72, 0.45
cpu: 184.1% | rss: 4.9 GB (29.4%) | heap: 2.2 GB / 2.3 GB (95.1%) | ext: 2.5 GB | delay: 4 ms | 00:00:36 | loadavg: 1.34, 0.74, 0.46
cpu: 176.7% | rss: 4.9 GB (29.5%) | heap: 2.2 GB / 2.3 GB (95.1%) | ext: 2.5 GB | delay: 717 ms | 00:00:37 | loadavg: 1.34, 0.74, 0.46
cpu: 169.2% | rss: 4.9 GB (29.4%) | heap: 2.2 GB / 2.3 GB (95.4%) | ext: 2.5 GB | delay: 722 ms | 00:00:38 | loadavg: 1.34, 0.74, 0.46
cpu: 162.2% | rss: 4.9 GB (29.6%) | heap: 2.3 GB / 2.4 GB (96.1%) | ext: 2.5 GB | delay: 782 ms | 00:00:39 | loadavg: 1.34, 0.74, 0.46
cpu: 116.9% | rss: 5.0 GB (30.2%) | heap: 2.4 GB / 2.4 GB (97.7%) | ext: 2.5 GB | delay: 2614 ms | 00:00:40 | loadavg: 1.34, 0.74, 0.46
cpu: 131.3% | rss: 5.1 GB (30.9%) | heap: 2.5 GB / 2.6 GB (97.8%) | ext: 2.5 GB | delay: 572 ms | 00:00:42 | loadavg: 1.39, 0.76, 0.47
cpu: 140.2% | rss: 5.2 GB (31.5%) | heap: 2.6 GB / 2.6 GB (97.8%) | ext: 2.5 GB | delay: 852 ms | 00:00:43 | loadavg: 1.39, 0.76, 0.47
cpu: 151.1% | rss: 5.3 GB (32.1%) | heap: 2.7 GB / 2.7 GB (97.9%) | ext: 2.5 GB | delay: 769 ms | 00:00:44 | loadavg: 1.39, 0.76, 0.47
cpu: 151.2% | rss: 5.4 GB (32.7%) | heap: 2.8 GB / 2.8 GB (97.9%) | ext: 2.6 GB | delay: 819 ms | 00:00:45 | loadavg: 1.39, 0.76, 0.47
cpu: 145.6% | rss: 5.6 GB (34.0%) | heap: 3.0 GB / 3.0 GB (98.1%) | ext: 2.6 GB | delay: 888 ms | 00:00:47 | loadavg: 1.36, 0.76, 0.47
cpu: 178.4% | rss: 3.6 GB (21.8%) | heap: 427 MB / 626 MB (68.3%) | ext: 1.5 GB | delay: 786 ms | 00:00:48 | loadavg: 1.36, 0.76, 0.47
⠀
You can now view website in the browser.

to (heap max - heap: 1.5 GB)

cpu: 130.2% | rss: 563 MB (3.4%) | heap: 370 MB / 406 MB (91.3%) | ext: 142 MB | delay: 1143 ms | 00:00:09 | loadavg: 2.13, 1.29, 0.83
cpu: 149.5% | rss: 624 MB (3.8%) | heap: 424 MB / 462 MB (91.8%) | ext: 145 MB | delay: 1310 ms | 00:00:10 | loadavg: 2.13, 1.29, 0.83
cpu: 167.9% | rss: 692 MB (4.2%) | heap: 495 MB / 526 MB (94.1%) | ext: 148 MB | delay: 1103 ms | 00:00:11 | loadavg: 2.13, 1.29, 0.83
cpu: 154.3% | rss: 1.2 GB (7.0%) | heap: 950 MB / 1.0 GB (93.7%) | ext: 145 MB | delay: 859 ms | 00:00:14 | loadavg: 1.95, 1.27, 0.82
cpu: 163.2% | rss: 1.8 GB (10.6%) | heap: 1.2 GB / 1.3 GB (89.3%) | ext: 441 MB | delay: 831 ms | 00:00:15 | loadavg: 1.95, 1.27, 0.82
cpu: 164.4% | rss: 1.7 GB (10.3%) | heap: 1.1 GB / 1.2 GB (90.8%) | ext: 515 MB | delay: 774 ms | 00:00:16 | loadavg: 2.04, 1.30, 0.83
cpu: 214.0% | rss: 1.7 GB (10.3%) | heap: 1.1 GB / 1.2 GB (90.4%) | ext: 515 MB | delay: 307 ms | 00:00:17 | loadavg: 2.04, 1.30, 0.83
cpu: 204.2% | rss: 1.7 GB (10.3%) | heap: 1.1 GB / 1.2 GB (90.6%) | ext: 515 MB | delay: 99 ms | 00:00:18 | loadavg: 2.04, 1.30, 0.83
cpu: 198.3% | rss: 1.7 GB (10.3%) | heap: 1.1 GB / 1.2 GB (91.2%) | ext: 515 MB | delay: 4 ms | 00:00:19 | loadavg: 2.04, 1.30, 0.83
cpu: 194.7% | rss: 1.7 GB (10.3%) | heap: 1.1 GB / 1.2 GB (91.8%) | ext: 515 MB | delay: 10 ms | 00:00:20 | loadavg: 2.04, 1.30, 0.83
cpu: 174.8% | rss: 2.7 GB (16.3%) | heap: 1.3 GB / 1.3 GB (94.8%) | ext: 1.4 GB | delay: 15 ms | 00:00:26 | loadavg: 1.96, 1.30, 0.83
cpu: 157.5% | rss: 2.7 GB (16.4%) | heap: 1.3 GB / 1.3 GB (95.9%) | ext: 1.4 GB | delay: 732 ms | 00:00:27 | loadavg: 1.96, 1.31, 0.84
cpu: 154.3% | rss: 2.8 GB (17.0%) | heap: 1.4 GB / 1.4 GB (96.4%) | ext: 1.4 GB | delay: 790 ms | 00:00:29 | loadavg: 1.96, 1.31, 0.84
cpu: 133.7% | rss: 2.9 GB (17.4%) | heap: 1.5 GB / 1.5 GB (97.2%) | ext: 1.4 GB | delay: 2000 ms | 00:00:30 | loadavg: 1.96, 1.31, 0.84
cpu: 167.1% | rss: 1.6 GB (9.4%) | heap: 349 MB / 608 MB (57.4%) | ext: 169 MB | delay: 744 ms | 00:00:31 | loadavg: 1.96, 1.31, 0.84
cpu: 167.7% | rss: 1.6 GB (9.4%) | heap: 486 MB / 613 MB (79.3%) | ext: 175 MB | delay: 800 ms | 00:00:32 | loadavg: 1.88, 1.30, 0.84
cpu: 156.3% | rss: 1.7 GB (10.1%) | heap: 681 MB / 727 MB (93.6%) | ext: 183 MB | delay: 791 ms | 00:00:35 | loadavg: 1.88, 1.30, 0.84
⠀
You can now view website in the browser.

(Building development bundle step also takes less time as a result)

That won't help at all with build problem (mentioned by @nkuehn ), but is at least a start at addressing these problems

pieh avatar Feb 06 '23 13:02 pieh

UPDATE: the build time issue turned out to be unrelated (MDX v2 plugin issue, found a workaround).

@pieh thank you for helping out! We do have a huge memory issue in the development mode, too. I just turned to debugging the build first because it was more reproducible.

Caveat: ~~I still cannot reliably say that my problem is the same. My current path of investigation is that at build time I am seeing very large page-specific JS bundles in .cache/page-ssr/routes/ that repeat the same dependent libraries over and over again whereas I think that dependencies that are not per-page should normally live only once in render-page.js (may point towards an issue our application structure). Roughly looking into a heap dump showed that all these page specific js bundles live in memory in the peak memory situations (webpack CachedSource objects), which feels bogus even if our implementation has no issues. I have not been able to heap dump a dev server though (OOM, dumping requires another doubling of memory). --> all done on gatsby 5.5.0 and latest MDX v2~~

I can now say the build and develop time issues are unrelated. But I can also confirm the develop time issue discussed here.

EDIT / UPDATE: (removed, turned out unrelated to this discussion)

nkuehn avatar Feb 06 '23 22:02 nkuehn

Hi, I commented before on this thread here and here. Now that i'm commenting, i can say that our issues with gatsby 5 are resolved. the steps that we followed is as below:

  1. there was recompilation per mdx, which when tracing the bug we found out that it's important to put your import x.css or all css related stuffs at the end of your imports.

  2. it's so important to not concurrent build your template pages, for example your mdx's, because each of theme will be passed as context variable to template, which makes memory leak when you have accountable number of files to render.

  3. if you are using some styling system like gatsby-plugin-material-ui, which raises warning at compile time, it's so important to care about that warnings

    • for example in our case, using gatsby-plugin-material-ui added some hydration error while using dev_server, which with replacing it with it's newer verified version => gatsby-plugin-emotion, all errors has gone.
  4. there is an important fix in gatsby v5.3, which is related to esm based modules i think, if you aren't yet, please upgrade to latest.

  5. and the last and most important thing, please take a look at your gatsby-node, if you have different categories of files that should be compiled, separate them. and make their graphql query to run independently, and then create their related pages in some isolated context, it reduces your build time.

    • for example, if you have blog and knowledge-base, where can be build in different templates, separate their build scenarios

important note on running development server: Change your gatsby-node in the way that your mdx related files not be rendered in development mode, there is an error when running development server on mdx files, for our team, it increases the memory usage incrementally while changing files and recompiling. which makes memory heap error after a while.

@mjBayati I'm interested in several of the things you did here to mitigate your issues. This is the most unique advice I've seen on build performance with Gatsby, but some of the things you did are a little hard to determine how to replicate.

  1. You say you moved you css import, can you be more specific? What file is it in? Where was it importing from? Did the css change or did you just change the import order?
  2. How are you determining which templates are building concurrently? Did you go from sync to async node functions or did you determine some other form of control over concurrency?
  3. You say "separate" the queries. How do you mean? Are you writing multiple queries that need to be completed synchronously? I would love to see an example or reproduction, even if your example only contains example queries with no actual data, I'm interested in how your are structuring your file to achieve separation of concerns

Thanks!

Hi, thanks for your deep diving into my comment, here it is some more explanation about your questions:

  1. it was about reordering imports, and putting all css imports to last,
  2. I realized that in new version of mdx rendering, all content will be passed to template as children, which this content, is huge and increases memory usage, our build had below structure before
Promise(fetchBlogsContentFromGraphql() && renderBlogs())
Promise(fetchKbContentsFromGraphql() && renderKbs())
Promsie(fetchLandingsFromGraphql() && renderLandings())

Promise.all().then(doOtherWorks())

and also renderX() functions was as below:


async loop on each page content:
          renderPageByTemplate()

what we did was removing concurrent builds and await on each section, and also making renderPagesLoops sync.

** Also it's important to separate your graphql query + build, like what we did, (separations of blogs, kbs and landings)

by this kind of works, we was able to make our dev and production servers runnable!.

mjBayati avatar Feb 07 '23 07:02 mjBayati

An indication to whomever may be competent to investigate the issue better: Development mode with the discussed heap issues (< 50 pages project) creates an 11GB sized ./.cache/webpack/stage-develop/0.pack file in our project that has similar issues as discussed here. Unfortunately I have no means to introspect it reasonably and also am not able to create a heap dump due to the sheer amount of memory involved.

nkuehn avatar Feb 07 '23 20:02 nkuehn