quick-lint-js
quick-lint-js copied to clipboard
Stack overflow issues with MSVC
Our stack limit is 150. Adding one source_code_span
variable in parser::parse_and_visit_class_or_interface_member
was the final straw and caused one of our stack depth tests to fail. 😕
I'm going to slower the stack limit for now, but we should find a better solution.
I did some investigation with a x64 Debug build.
-
parser::parse_and_visit_statement
consumes about 2 KiB of stack space. -
parser::parse_and_visit_class_or_interface_member
consumes about 1.5 KiB of stack space per call. -
It looks like there is significant padding between most stack variables. For example, in
parser::parse_and_visit_statement
:- [1012,1012]
bool
- [1013,1013]
bool
- [1014,1031] 18 bytes of padding
- [1032,1047]
parser::bool_guard
- [1048,1079] 32 bytes of padding
- [1080,1095]
parser::bool_guard
- [1096,1127] 32 bytes of padding
- [1128,1143]
parser::bool_guard
- [1144,1155] 12 bytes of padding
- [1156,1156]
bool
- [1157,1157]
bool
- [1012,1012]
-
Variables with nonoverlapping lifetimes are allocated on the stack separately. This is a big problem because we have a lot of
switch
cases.
I think I addressed the MSVC issues by refactoring parse_and_visit_class to use a helper class instead of a bunch of lambda functions. I'm testing with CI to see: 6eb7a7572908bd6d5830e7719aebb096f89e2a5b
MSVC stackoverflowed with my patch. =\
I think the main issue is that, in unoptimized builds, MSVC allocates all locals on the stack irrespective of liveness. So the following code allocates two booleans on the stack:
bool foo(int p) {
if (p > 0) {
bool ok = false;
return ok;
} else {
bool ok = true; // Note: does not share stack space with 'ok' above.
return ok;
}
}
I don't know why there's all that padding between certain variables. I tried disabling some security switches (like /RTC
) but they didn't affect stack allocation significantly.
I worked around this issue in a few commits between 246ddc65057db5a682c75f3cfcc5e196f88acfb3 and 7c04c4f21c65ebd3fe7a62a7a5a6182ec81bcd77 by reducing the number of local variables in some key functions. This let us raise the recursion limit back to 150 in commit 59c3ea4c7245ff50e836296ff5c075b1cb4fade6 like it was before I filed this issue.
I created a tool to analyze stack usage: https://github.com/strager/cppstacksize
It seems that calls to diag_reporter::report are expensive for three reasons:
- Diagnostic classes cannot fit in a register, thus are spilled onto the stack.
- Temporary memory is not reused in different branches. (This is the liveness issue I mentioned in my previous comment.)
- There are a lot of calls to report.
I tried to shrink source_code_span
to fit in a 64-bit register. This helped, but stack usage was still high.
I should do more digging. There's probably other stuff than report which bloats stack usage.
I think the main fix is to split functions like parser::parse_and_visit_expression_remainder so that stack frames of disparate branches are allocated separately.