quick-lint-js icon indicating copy to clipboard operation
quick-lint-js copied to clipboard

Stack overflow issues with MSVC

Open strager opened this issue 2 years ago • 5 comments

Our stack limit is 150. Adding one source_code_span variable in parser::parse_and_visit_class_or_interface_member was the final straw and caused one of our stack depth tests to fail. 😕

I'm going to slower the stack limit for now, but we should find a better solution.

strager avatar Jun 01 '22 23:06 strager

I did some investigation with a x64 Debug build.

  • parser::parse_and_visit_statement consumes about 2 KiB of stack space.

  • parser::parse_and_visit_class_or_interface_member consumes about 1.5 KiB of stack space per call.

  • It looks like there is significant padding between most stack variables. For example, in parser::parse_and_visit_statement:

    • [1012,1012] bool
    • [1013,1013] bool
    • [1014,1031] 18 bytes of padding
    • [1032,1047] parser::bool_guard
    • [1048,1079] 32 bytes of padding
    • [1080,1095] parser::bool_guard
    • [1096,1127] 32 bytes of padding
    • [1128,1143] parser::bool_guard
    • [1144,1155] 12 bytes of padding
    • [1156,1156] bool
    • [1157,1157] bool
  • Variables with nonoverlapping lifetimes are allocated on the stack separately. This is a big problem because we have a lot of switch cases.

strager avatar Jun 01 '22 23:06 strager

I think I addressed the MSVC issues by refactoring parse_and_visit_class to use a helper class instead of a bunch of lambda functions. I'm testing with CI to see: 6eb7a7572908bd6d5830e7719aebb096f89e2a5b

strager avatar Jan 28 '23 05:01 strager

MSVC stackoverflowed with my patch. =\

strager avatar Jan 28 '23 05:01 strager

I think the main issue is that, in unoptimized builds, MSVC allocates all locals on the stack irrespective of liveness. So the following code allocates two booleans on the stack:

bool foo(int p) {
  if (p > 0) {
    bool ok = false;
    return ok;
  } else {
    bool ok = true;  // Note: does not share stack space with 'ok' above.
    return ok;
  }
}

I don't know why there's all that padding between certain variables. I tried disabling some security switches (like /RTC) but they didn't affect stack allocation significantly.

I worked around this issue in a few commits between 246ddc65057db5a682c75f3cfcc5e196f88acfb3 and 7c04c4f21c65ebd3fe7a62a7a5a6182ec81bcd77 by reducing the number of local variables in some key functions. This let us raise the recursion limit back to 150 in commit 59c3ea4c7245ff50e836296ff5c075b1cb4fade6 like it was before I filed this issue.

strager avatar Jan 30 '23 10:01 strager

I created a tool to analyze stack usage: https://github.com/strager/cppstacksize

It seems that calls to diag_reporter::report are expensive for three reasons:

  • Diagnostic classes cannot fit in a register, thus are spilled onto the stack.
  • Temporary memory is not reused in different branches. (This is the liveness issue I mentioned in my previous comment.)
  • There are a lot of calls to report.

I tried to shrink source_code_span to fit in a 64-bit register. This helped, but stack usage was still high.

I should do more digging. There's probably other stuff than report which bloats stack usage.

I think the main fix is to split functions like parser::parse_and_visit_expression_remainder so that stack frames of disparate branches are allocated separately.

strager avatar Jun 30 '23 02:06 strager