mCRL2 icon indicating copy to clipboard operation
mCRL2 copied to clipboard

The jitty rewriter requires a lot of stack memory for its recursive calls.

Open mlaveaux opened this issue 6 years ago • 0 comments

Just a ticket as reminder and to collect all investigation so far. The jitty rewriter was benchmarked by computing the factorial10.txt of 10 using Peano encoding. Exploring the state space of this specification requires about 3.5GB of stack memory. This can be seen from the following heap+stack memory graph, the second peak is related to issue #1488. The term itself, a nested function application of 3628800 successors, takes about 400MB. memory_usage_fac10_jitty

There are two important functions https://github.com/mCRL2org/mCRL2/blob/58f5129fd546a46e704f4ac6f2d0e4912f94ed88/libraries/data/source/detail/rewrite/jitty.cpp#L616 and https://github.com/mCRL2org/mCRL2/blob/58f5129fd546a46e704f4ac6f2d0e4912f94ed88/libraries/data/source/detail/rewrite/jitty.cpp#L711 that related to this issue. It was observed by Jan Friso and confirmed by using a debugger that each recursive call of rewrite_aux + rewrite_aux_function_symbol takes about 480 bytes of stack memory.

I will have to check the memory breakdown again after commit e928fa96df629be33875b48b1c63b2b8472b94dd.

A breakdown of memory used in the rewrite_aux_function_symbol.
We use a function called alloca to allocate memory directly on the stack. For a term of arity two it allocates two data expressions and two booleans to hold rewritten terms. A data expression takes 8 bytes and a boolean 1, so one would expect that this takes 16 + 2 (=18) bytes in total. However, the first observation is that alloca aligns to 16 bytes so this actually requires 32 bytes. Then there is an if-statement after which more stack allocations are performed. Here max_vars was equal to 4 which means that 4 unprotected variables and data_expressions, of size 8, and another 4 booleans are stack allocated. This accounts for 8 * 8 + 4 = 68, but it becomes 80 because of alignment.

The compiler generates additional stack allocations. At function entry 6 elements are pushed onto the stack, which accounts for about 48 bytes. So in total we can account for 160 bytes. However, the compiler also decides to allocate 216 bytes at function entry, possibly for all the local variables that have been used. Which results in a total of 376 bytes.

For the rewrite_aux function there are again 6 things pushed to stack, accounting for 48 bytes, and a stack allocation of 120 bytes, so in total 168. In the mean-time some things are also removed from the stack, which probably results in the total usage of 480 bytes. In total there are 336 bytes allocated (by the compiler using a subtraction of the rsp) for which their reason is not understood.

mlaveaux avatar Sep 21 '18 14:09 mlaveaux