JavaScript heap out of memory during compilation of a very large model.
Description:
I wrote a code generator that writes a Stan code for a family of increasingly complex multinomial models.
Due to the explicit request of my employer, I am not authorized to share the actual model here. The size of the model in question is 2.3MB.
For reference, the biggest model I was able to successfully compile so far have the size of 1.6MB. It took over 8 full days to compile.
Reproducible Steps:
library(rstan)
rstan_options(auto_write = TRUE)
stan_model("caseABCDEFGH_a1.stan")
Current Output:
If applicable, any relevant output from RStan.
adam@potwor:~/stan_models $ export NODE_OPTIONS=--max-old-space-size=100096
adam@potwor:~/stan_models $ R R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> library(tictoc)
> library(rstan)
Loading required package: StanHeaders
rstan version 2.26.22 (Stan version 2.26.1)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)
> rstan_options(auto_write = TRUE)
> tic();stan_model("caseABCDEFGH_a1.stan");toc()
<--- Last few GCs --->
[850056:0x559f36aea3a0] 1087740225 ms: Mark-sweep (reduce) 1398.3 (1448.0) -> 1398.1 (1446.7) MB, 1339.8 / 0.0 ms (+ 12.9 ms in 4 steps since start of marking, biggest step 5.2 ms, walltime since start of marking 1357 ms) (average mu = 0
.659, current mu [850056:0x559f36aea3a0] 1087741584 ms: Mark-sweep 1398.7 (1447.0) -> 1398.6 (1448.7) MB, 1357.9 / 0.0 ms (average mu = 0.472, current mu = 0.001) allocation failure GC in old space requested
<--- JS stacktrace --->
V8 FATAL ERROR in Reached heap limit: Allocation failed - JavaScript heap out of memory
#
#
# Fatal error in , line 0
# API fatal error handler returned after process out of memory
#
#
#
#FailureMessage Object: 0x7fff08a842b0
==== C stack trace ===============================
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::base::debug::StackTrace::StackTrace()+0x16) [0x7fc4699be786]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(+0x49fa4b) [0x7fc4692c9a4b]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(V8_Fatal(char const*, ...)+0x177) [0x7fc4699b3fc7]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool)+0x35d) [0x7fc4692d147d]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(+0x5a7ab5) [0x7fc4693d1ab5]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(+0x5bd9c7) [0x7fc4693e79c7]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment)+0x67) [0x7fc4693e99b7]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment)+0x25) [0x7fc4693e9a35]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin)+0x3a) [0x7fc4693afe5a]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(v8::internal::Runtime_AllocateInOldGeneration(int, unsigned long*, v8::internal::Isolate*)+0x94) [0x7fc469afec84]
/home/adam/R/x86_64-pc-linux-gnu-library/4.3/V8/libs/V8.so(+0xab6a38) [0x7fc4698e0a38]
Trace/breakpoint trap (core dumped)
Operating System:
Ubuntu 22.04 64bit, AMD Ryzen Threadripper 1950X 16-Core Processor 128GB of RAM
I understand I am pushing the limits here, but at least I would like to learn why it a) ignores the export NODE_OPTIONS=--max-old-space-size=100096, and b) see if I can push the envelope even further. I still have some memory to spare, compared to the maximum memory footprint of the last result.
Have you tried using CmdStan, CmdStanR, or CmdStanPy? They all use the OCaml compiler rather than cross-compiling in Javascript. It might scale a lot better.
Scaling our compiler isn't something we've put much if any thought into as these large auto-generated models aren't part of our standard use case and we're stretched pretty thin on the dev side as is. Having said that, we would take contributions that scaled efficiency of our compiler, either in base OCaml or in the Javascript cross-compilation.
Thank you very much for the answer!
I tried Stan from within Python, but it failed compiling a much smaller model (it failed when the model size exceeded about 100kB). I will try the methods you mentioned and see the results.
I will also try setting the export NODE_OPTIONS=--max-old-space-size=100096 system-wide - in case if the Java cross compiler was installed as a background task and its environment is not affected by the environment set by the command line that invoked the calculations.
What do you mean by "tried Stan from within Python"? We have two official R interfaces, RStan and CmdStanR, and two official Python interfaces, PyStan and CmdStanPy.
CmdStanR and CmdStanPy invoke a separate process to compile a Stan program (from the makefile of CmdStan) and then a separate process to run the program, streaming results to a file. They use the direct output of compiling our OCaml transpolar. I was suggesting using one of those or using CmdStan directly to avoid Python and R altogether.
RStan uses the Javascript cross-compiler of the OCaml transpiler. I don't know what PyStan does.
Besides the RStan I used PyStan.
Thank you very much again for the explanation.