next.js icon indicating copy to clipboard operation
next.js copied to clipboard

Edge runtime - JavaScript heap out of memory

Open ValentinH opened this issue 1 year ago • 14 comments

Verify canary release

  • [X] I verified that the issue exists in the latest Next.js canary release

Provide environment information

Operating System:
      Platform: darwin
      Arch: arm64
      Version: Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000
    Binaries:
      Node: 16.18.1
      npm: 8.19.2
      Yarn: 1.22.19
      pnpm: 7.26.1
    Relevant packages:
      next: 13.4.6-canary.4
      eslint-config-next: N/A
      react: 18.2.0
      react-dom: 18.2.0
      typescript: 5.1.3

Which area(s) of Next.js are affected? (leave empty if unsure)

Middleware / Edge (API routes, runtime)

Link to the code that reproduces this issue or a replay of the bug

https://github.com/ValentinH/next-edge-build-issue

To Reproduce

  • clone the repository
  • yarn install
  • yarn build (or even NODE_OPTIONS='--max-old-space-size=4096' yarn build to make it crash sooner)

The memory consumption goes super high (more than 16GB on my machine) and the command ultimately fails with:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[42616:0x158008000] 35153 ms: Scavenge 4086.3 (4135.9) -> 4084.0 (4136.7) MB, 4.1 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure [42616:0x158008000] 35159 ms: Scavenge 4087.1 (4136.7) -> 4084.7 (4137.4) MB, 4.4 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure [42616:0x158008000] 35549 ms: Scavenge 4088.0 (4137.7) -> 4085.5 (4146.4) MB, 386.1 / 0.0 ms (average mu = 0.407, current mu = 0.205) allocation failure

<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory 1: 0x1000f9c84 node::Abort() [/whatever/node] 2: 0x1000f9e74 node::ModifyCodeGenerationFromStrings(v8::Localv8::Context, v8::Localv8::Value, bool) [/whatever/node] 3: 0x10023e840 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/whatever/node] 4: 0x10023e800 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/whatever/node] 5: 0x1003c1d1c v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/whatever/node] 6: 0x1003c083c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/whatever/node] 7: 0x1003cbb84 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/whatever/node] 8: 0x1003cbc18 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/whatever/node] 9: 0x10039eaac v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/whatever/node] 10: 0x1006d6bd0 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/whatever/node] 11: 0x1009ea08c Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_NoBuiltinExit [/whatever/node] 12: 0x1055b6444 13: 0x1052a2b90 14: 0x105115d20 15: 0x1055d44b4 16: 0x1052a40f4 17: 0x1055d34b0 18: 0x10504f498 19: 0x1055ccdb8 20: 0x1055d9aa4 21: 0x10097dd18 Builtins_InterpreterEntryTrampoline [/whatever/node] 22: 0x10504e368 23: 0x1052a1ab4 24: 0x104f61404 25: 0x104fe058c 26: 0x104ff0250 27: 0x104fdfc84 28: 0x10522f278 29: 0x100a32178 Builtins_PromiseFulfillReactionJob [/whatever/node] 30: 0x10099f6f4 Builtins_RunMicrotasks [/whatever/node] 31: 0x10097b9e4 Builtins_JSRunMicrotasksEntry [/whatever/node] 32: 0x10034e4cc v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/whatever/node] 33: 0x10034e900 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/whatever/node] 34: 0x10034e9ec v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*, v8::internal::MaybeHandlev8::internal::Object) [/whatever/node] 35: 0x100371628 v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate) [/whatever/node] 36: 0x100371ebc v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) [/whatever/node] 37: 0x100049c4c node::InternalCallbackScope::Close() [/whatever/node] 38: 0x10004977c node::CallbackScope::~CallbackScope() [/whatever/node] 39: 0x1000d1ae0 (anonymous namespace)::uvimpl::Work::AfterThreadPoolWork(int) [/whatever/node] 40: 0x10095c0c0 uv__work_done [/whatever/node] 41: 0x10095f85c uv__async_io [/whatever/node] 42: 0x1009715a8 uv__io_poll [/whatever/node] 43: 0x10095fcec uv_run [/whatever/node] 44: 0x10004a6d4 node::SpinEventLoop(node::Environment*) [/whatever/node] 45: 0x100133a90 node::NodeMainInstance::Run(int*, node::Environment*) [/whatever/node] 46: 0x100133770 node::NodeMainInstance::Run() [/whatever/node] 47: 0x1000cde38 node::Start(int, char**) [/whatever/node] 48: 0x19cd8fe50 start [/usr/lib/dyld] error Command failed with signal "SIGABRT".

Describe the Bug

We are in the process of migrating all our API routes to the Edge runtime. So far we have migrated 43 of them and for a few days we are getting build errors on Vercel:

ERROR  run failed: command  exited (129)
Error: Command "turbo run build" exited with 129
BUILD_UTILS_SPAWN_129: Command "turbo run build" exited with 129

The build was actually also sometimes failing locally with the Reached heap limit Allocation failed - JavaScript heap out of memory error.

After digging a lot in our codebase to understand what was going on, we managed to specifically identify our Edge functions. These functions are using our generated GraphQL client that lives in a pretty large file (2.5MB) that contains all the possible operations.

I managed to create a reproduction in a greenfield Next.js project using the latest canary. However, to reach the same amount of memory, I had to create many more routes (200) to reproduce the crash. The main reason for this is that I'm not able to share our internal GraphQL client generated on our private schema. Therefore, I created a smaller client (3 times smaller) from the Gitlab GraphQL endpoint.

If I create the same scenario of 200 handlers but not using the Edge runtime, the build runs smoothly in under 10 seconds with no visible impact on my machine memory. To witness this, you can try the serverless branch on the shared repository.

Expected Behavior

Compiling many Edge runtime API routes should be similar to Serverless API routes.

Which browser are you using? (if relevant)

No response

How are you deploying your application? (if relevant)

Vercel

NEXT-1785

ValentinH avatar Jun 14 '23 15:06 ValentinH

Thanks to @elbalexandre, we discovered that setting swcMinify: false in the next.config.js "solves" the issue: the build is around 10x slower than the serverless version but it doesn't crash anymore.

On the serverless version, there is not much difference when using or not swcMinify.

ValentinH avatar Jun 29 '23 06:06 ValentinH

We are facing this issue as well, it runs out of memory and does not let us publish the page, through GitHub actions or through Cloudflare. It just raises a "yarn run build exited with code 129".

With the flag to false we can build, with the flag to true, it goes OOM. And just checked and this flag will be deprecated on Next15 and always enabled.

Ref: https://github.com/vercel/next.js/pull/57467/files

Eusebiotrigo avatar Nov 08 '23 12:11 Eusebiotrigo

Thanks for sharing, we haven't tried enabling the outputFileTracing flag .

ValentinH avatar Nov 08 '23 13:11 ValentinH

We moved some pages (3) to the app folder from the pages folder and we got the high memory error again in our build in Cloudflare (and it already has the --max-old-space to 8GB.

So the minify set to false helped for a little, but now it is not working for us anymore.

Eusebiotrigo avatar Nov 10 '23 08:11 Eusebiotrigo

I am having the same issue. Very hard to figure out which page or lib is causing the issue.

izakfilmalter avatar Nov 13 '23 11:11 izakfilmalter

any update? same issue here

boredjoker avatar Nov 17 '23 15:11 boredjoker

It's sth related to source map of swc compilation, we're investigating now

huozhi avatar Dec 07 '23 21:12 huozhi

@huozhi let me know if you need help testing the fix. I can give you temp access to our closed source repo. Build fails everytime with edge runtime, but will succeed on node.

izakfilmalter avatar Dec 08 '23 12:12 izakfilmalter

We landed a fix (#59393) in 14.0.5-canary.2, ideally it could reduce a bit the memory consuming issue during minification. Please test against and let us know the result 🙏

The reproduction as it has too many edge functions (not sure if it's too extreme) is still failing, we'll see if we can keep improve on it.

huozhi avatar Dec 08 '23 23:12 huozhi

The reproduction as it has too many edge functions (not sure if it's too extreme) is still failing, we'll see if we can keep improve on it.

In our application, we have a lot of API routes and ultimately we would like to be able to use the Edge Runtime in all of them. Therefore, I don't think it's too extreme. Could it be possible to not minify all of them at once but do them by batch?

ValentinH avatar Dec 09 '23 05:12 ValentinH

Still I'm looking forward to trying the fix of #59393 because we recently have our builds often failing on Vercel (not locally) even though we stopped using our huge codegen file in the Edge functions. Interestingly, redeploying without build cache makes it work. Not sure why doing more work (without cache) reduces the memory consumption but this might be a lead.

ValentinH avatar Dec 09 '23 05:12 ValentinH

@huozhi Tried 14.0.5-canary.5, still failed. https://vercel.com/steeple-inc/steeple-works/BHscDFNbaYJd4TVTJP9FK3CrszvP

izakfilmalter avatar Dec 12 '23 19:12 izakfilmalter

@izakfilmalter the deployment is 404 for me, do you have a testing app to share that we can keep looking into it?

huozhi avatar Dec 12 '23 21:12 huozhi

@kdy1 @huozhi I can see this has been merged: https://github.com/swc-project/swc/pull/8546

I'm happy to test once this lands on canary, just let me know! Thanks

oliversoar avatar Jan 30 '24 11:01 oliversoar

@kdy1 @huozhi any chance that the above mentioned fix could be added to @next/swc? We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM. But i'm scared that this will stop working at some point.

ValentinH avatar Feb 07 '24 10:02 ValentinH

We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM.

I think creating an environment variable named VERCEL_FORCE_NO_BUILD_CACHE with a value of 1 would be a better stopgap solution for you. The cache needs to be held in memory for it to be used and can affect memory usage.

ericmatthys avatar Feb 07 '24 11:02 ericmatthys

We have reached a state where our production application deployments keep failing on Vercel due to OOM. The only way we have to workaround this for now is to "Redeploy". For some reason, redeploying without cache avoids the OOM.

I think creating an environment variable named VERCEL_FORCE_NO_BUILD_CACHE with a value of 1 would be a better stopgap solution for you. The cache needs to be held in memory for it to be used and can affect memory usage.

This is what works for us as a temporary solution. Painful as it doubles the build time.

Keen to see this fixed so we can roll out the edge runtime to the rest of our app.

oliversoar avatar Feb 07 '24 11:02 oliversoar

We just reached the point where even without cache the app won't build anymore. We are therefore switching again to swcMinify: false which is slower but doesn't have these leaks.

ValentinH avatar Feb 09 '24 10:02 ValentinH

We just reached the point where even without cache the app won't build anymore. We are therefore switching again to swcMinify: false which is slower but doesn't have these leaks.

I believe this issue will be resolved once swc_core is updated (see: https://github.com/vercel/next.js/pull/61662). Waiting on this as well though.

gwkline avatar Feb 09 '24 15:02 gwkline

Looking forward to it!

In the meantime, in case it helps someone else: we managed to reduce our Edge functions bundles quite a lot by replacing an import of @sentry/nextjs by @sentry/core. We were ending up having react-dom and @sentry/replay in each Edge function bundle (approx. 1MB "Stat size") 🙈

ValentinH avatar Feb 09 '24 15:02 ValentinH

Seems to be fixed as of 14.1.1-canary.52 for our project (without needing to use VERCEL_FORCE_NO_BUILD_CACHE or swc_minify: false)

gwkline avatar Feb 14 '24 21:02 gwkline

After upgrading from [email protected] to [email protected] I am getting the memory error again:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

This is reproducible for me locally, and I can get the build to pass if I set export NODE_OPTIONS=--max_old_space_size=8192. But there is not workaround for deploying on Vercel.

Notably, the step: Creating an optimized production build ... has gone from taking 30 seconds to several minutes.

@ijjk , perhaps this is a side-effect of your split chunk handling update? FWIW I do see a dramatic reduction in edge-server-production file size when running [email protected] locally with the increased memory limits. https://github.com/vercel/next.js/pull/62205

steve-marmalade avatar Feb 19 '24 18:02 steve-marmalade

Hi, @steve-marmalade could you provide a reproduction? The chunk splitting PR you referenced was fixing massive cache/memory usage for other cases we've seen for edge-runtime

ijjk avatar Feb 19 '24 20:02 ijjk

I just tried both [email protected] and [email protected] on the original reproduction I shared and it is still crashing: https://github.com/ValentinH/next-edge-build-issue. However, I don't know how much this is realistic: the number of functions is realistic IMO (we already have more than 100 edge functions in our app) but the size of the codegen is probably too much (even though this is what we used to have; now we switched to graphql-codegen which generate much smaller files).

ValentinH avatar Feb 20 '24 07:02 ValentinH

THANK YOU SO MUCH GUYS @ijjk @huozhi!!! ❤️❤️❤️

I just tested 14.1.1-canary.69 on https://github.com/ValentinH/next-edge-build-issue and it now builds within 10 seconds with no visible impact on memory.

I still have to test it on our production repo but this looks super good 🎉

ValentinH avatar Feb 22 '24 09:02 ValentinH

I confirmed that it fixes our issue on our prod repo: from more than 8GB for next build to around 1GB! 🎉🎉🎉 Congrats!

ValentinH avatar Feb 22 '24 09:02 ValentinH

However, the memory is still getting really high when browsing the app in dev mode: image

But this is a subject for another issue 🙈

ValentinH avatar Feb 22 '24 09:02 ValentinH

I can confirm 14.1.1-canary.69 – no more OOM errors when deploying to vercel/cloudflare. Great job!

xanderim avatar Feb 22 '24 13:02 xanderim

14.1.1-canary.69 has also fixed OOM exceptions in our build! Thanks everybody!

dislick avatar Feb 22 '24 13:02 dislick

Nice! For us too! It reduces locally the time by half and there is no need to specify --max-old-space-size.

I wanted to ask, when is it planned to release the version 14.1.1? 😄

froblesmartin avatar Feb 22 '24 13:02 froblesmartin