jrsonnet
jrsonnet copied to clipboard
std.foldl over (not so) large list is causing stack overflow
What happens
std.foldl seems to be implemented internally with recursion, or otherwise causing stack overflow when iteration over large array.
Additionally, memory usage grows immensely with more foldl count
What is expected
std.foldl should not cause stack overflow even with infinite arrays
Verison
jrsonnet --version
jrsonnet 0.5.0-pre96
Context
We use "large" foldl to apply list of "mutations" onto list of kubernetes manifests.
The example below is simplified version of our use case. We run into stack overflow issues with ~50 mutations, but our code is more complex. So to reproduce here, the number is higher than our practical use case, but not that big still.
Or code works ok with go-jsonnet, but reaches stack overflow in jrsonnet. If we bump os-stack we get it work but memory consumption of our use case gets to >30GB.
How to reproduce
local createManifest = function(number) {
kind: 'Deployment',
metadata: {
name: 'test-deployment-' + number,
},
spec: {
template: {
nodeSelectors: {
'node-type': 'node',
'other-selector': 'foo',
},
},
},
};
local mutation1 = function(manifest) (
if manifest.kind == 'Deployment' then
manifest {
metadata+: {
labels+: {
test: 'foo',
},
},
}
else
manifest
);
local mutation2 = function(manifest) (
local removeSelectors = [
'beta.kubernetes.io/arch',
'kubernetes.io/arch',
'node.kubernetes.io/role',
'node-type',
];
if manifest.kind == 'Deployment' then
manifest {
spec+: {
template+: {
nodeSelectors: {
[field]: manifest.spec.template.nodeSelectors[field]
for field in std.objectFields(manifest.spec.template.nodeSelectors)
if std.member(removeSelectors, field)
},
},
},
}
else
manifest
);
local mutations = std.flatMap(
function(index) [
mutation1,
mutation2,
],
std.range(0, std.parseInt(std.extVar('fold-count')) - 1)
);
local manifests = std.map(
createManifest,
std.range(0, std.parseInt(std.extVar('map-count')) - 1)
);
std.map(
function(manifest) std.foldl(function(result_manifest, mutation) mutation(result_manifest), mutations, manifest),
manifests
)
Run:
for FOLD_COUNT in 1 10 100 500; do
echo ""
echo "==== FOLD_COUNT: $FOLD_COUNT ===="
/usr/bin/time -p -h -l jrsonnet results/jrsonnet-foldl-bug.jsonnet --ext-str fold-count=$FOLD_COUNT --ext-str map-count=100 > /dev/null
done
Result:
==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
4055040 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1198 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
13 involuntary context switches
40967561 instructions retired
32092929 cycles elapsed
2818048 peak memory footprint
==== FOLD_COUNT: 10 ====
real 0.05
user 0.03
sys 0.01
27561984 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
6937 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
102 involuntary context switches
251001710 instructions retired
161304203 cycles elapsed
26324992 peak memory footprint
==== FOLD_COUNT: 100 ====
real 1.78
user 1.43
sys 0.34
1413394432 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
345276 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
291 involuntary context switches
9893883867 instructions retired
5985518894 cycles elapsed
1412157440 peak memory footprint
==== FOLD_COUNT: 500 ====
stack overflow, try to reduce recursion, or set --max-stack to bigger value
results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
results/jrsonnet-foldl-bug.jsonnet:43:66-80: field <nodeSelectors> access
argument <o> evaluation
... repeated many times ...
results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
results/jrsonnet-foldl-bug.jsonnet:43:66-80: field <nodeSelectors> access
argument <o> evaluation
results/jrsonnet-foldl-bug.jsonnet:43:26-81: function <builtin_object_fields> call
field <nodeSelectors> manifestification
field <template> manifestification
field <spec> manifestification
elem <0> manifestification
real 0.37
user 0.29
sys 0.07
261378048 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
64020 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
48 voluntary context switches
150 involuntary context switches
1843870396 instructions retired
1171073480 cycles elapsed
260128768 peak memory footprint
with --max-stack 4096
==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
2121728 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
726 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
13 involuntary context switches
17444472 instructions retired
16588490 cycles elapsed
884736 peak memory footprint
==== FOLD_COUNT: 10 ====
real 0.01
user 0.00
sys 0.00
4493312 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1305 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
34 involuntary context switches
38361929 instructions retired
34536039 cycles elapsed
3256320 peak memory footprint
==== FOLD_COUNT: 100 ====
real 0.21
user 0.16
sys 0.04
143482880 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
35236 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
250 involuntary context switches
1001957115 instructions retired
664777830 cycles elapsed
142245888 peak memory footprint
==== FOLD_COUNT: 200 ====
real 0.73
user 0.57
sys 0.14
542973952 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
132770 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
224 involuntary context switches
3673679293 instructions retired
2352762219 cycles elapsed
541736960 peak memory footprint
==== FOLD_COUNT: 300 ====
real 1.61
user 1.25
sys 0.32
1201475584 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
293537 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
494 involuntary context switches
8038891961 instructions retired
5283522033 cycles elapsed
1200238592 peak memory footprint
==== FOLD_COUNT: 400 ====
real 2.90
user 2.27
sys 0.60
2116505600 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
516933 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
1343 involuntary context switches
14061168814 instructions retired
9470288606 cycles elapsed
2115268608 peak memory footprint
==== FOLD_COUNT: 500 ====
real 5.26
user 4.09
sys 1.08
3291742208 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
803856 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
0 signals received
0 voluntary context switches
6634 involuntary context switches
21968229954 instructions retired
16527925347 cycles elapsed
3290505216 peak memory footprint
For comparison here is go-jsonnet result:
==== FOLD_COUNT: 1 ====
real 0.01
user 0.00
sys 0.00
5984256 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1656 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
1 signals received
0 voluntary context switches
105 involuntary context switches
26255260 instructions retired
27298520 cycles elapsed
3678208 peak memory footprint
==== FOLD_COUNT: 10 ====
real 0.02
user 0.01
sys 0.00
11087872 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
2903 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
10 signals received
5 voluntary context switches
174 involuntary context switches
92086600 instructions retired
69245289 cycles elapsed
7053312 peak memory footprint
==== FOLD_COUNT: 100 ====
real 0.49
user 0.66
sys 0.05
140193792 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
34462 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
83 signals received
16 voluntary context switches
797 involuntary context switches
4181923667 instructions retired
2352113122 cycles elapsed
136097792 peak memory footprint
==== FOLD_COUNT: 200 ====
real 2.39
user 3.08
sys 0.19
501780480 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
122925 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
286 signals received
62 voluntary context switches
2090 involuntary context switches
18943427191 instructions retired
11051043727 cycles elapsed
497668096 peak memory footprint
==== FOLD_COUNT: 300 ====
real 7.48
user 8.97
sys 0.52
1121615872 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
274565 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
609 signals received
113 voluntary context switches
9582 involuntary context switches
49595121172 instructions retired
31446608435 cycles elapsed
1117507584 peak memory footprint
==== FOLD_COUNT: 400 ====
real 19.14
user 19.98
sys 1.39
2028417024 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
496646 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
1376 signals received
194 voluntary context switches
70547 involuntary context switches
96566460731 instructions retired
69863591500 cycles elapsed
2024267776 peak memory footprint
==== FOLD_COUNT: 500 ====
real 29.76
user 32.34
sys 1.53
3204464640 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
784387 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
1898 signals received
200 voluntary context switches
34467 involuntary context switches
168218433001 instructions retired
113452304861 cycles elapsed
3200331776 peak memory footprint
Which version of jrsonnet do you use?
jrsonnet --version
jrsonnet 0.5.0-pre96
- std.foldl doesn't use recursion Your code, however, does, due to lazy evaluation
- os-stack does nothing here, only max-stack will help max-stack specifies how much recursion is allowed, os-stack - what OS stack to allocate for execution (in megabytes). For me, only max-stack is required here, and the code works with the default os stack size. It is ok that go-jsonnet and jrsonnet require a different max-stack value, as they use different metering techniques.
Nonetheless, I'll look what can be done here.
Making this call tailstrict
for field in std.objectFields(manifest.spec.template.nodeSelectors) tailstrict
greately reduces stack size usage, due to field nodeSelectors being calculated prior to getting it fields, instead of going 500 overlays deep due to lazy-evaluated field get, however level of stacking object overlays makes this thing still highly inefficient
I wonder if there should be forced (i.e not-lazy) object comprehension syntax for patterns like this
Thank you for looking into it.
I tried to extract the core issue from our actual code. But maybe it is not actually showing the real issue. We have quite complex jsonnet that works in go-jsonnet but was getting stack overflow in jrsonnet. I was trying to pinpoint which jsonnet code is doing that, but concluded it was just something around how many of those "mutations" we have in the list.
os-stack does nothing here, only max-stack will help
For me we had to actually bump os-stack on jrsonnet to make our code work. So maybe the issue is somewhere else. I was not really able to understand where is it happening since it only gives:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
Is there a way to debug this better?
Oh, so it happens in the real code, not this one.
Yes, only os-stack will help here then.
Unfortunately, Rust/C++ are unable to recover from stack overflow, only golang is invulnerable to this due to dynamic stack size. Sometimes, when increasing --max-stack, you also need to increase --os-stack to make it fit. If you get fatal runtime error: stack overflow - then it's the time to increase os-stack
I am planning on finishing #121, which will solve many stack size issues.
EDIT: Turns out, there is a way to grow stack in rust safely: https://docs.rs/stacker/latest/stacker/ I'll try to add it in couple of places, it would be useful even with proper TCO, because my TCO implementation will have escape hatches for easier builtins implementation
Great. We will evaluate once next version is released. Thank you
I have implemented dynamic stack growth on master, so you should not experience need to use --os-stack anymore. However, max-stack is still there, and its limit is not changed. I will not do anything with it, because its current limit is reasonable, and its usage is just different from go-jsonnet.
While this is not a permanent solution, this issue is resolved for now, with better solution in development.