appscope
appscope copied to clipboard
Go signals on aarch64
While working on #1170, we found an issue with signal handling on aarch64. We're creating this ticket to resolve this issue.
From that ticket:
On aarch64 machines, go processes crash intermittently with a "bad g". This has only been seen on aarch64. This was uncovered in go integration tests (go_20) in test cases for "signalHandlerStatic" and "signalHandlerStaticStripped". By adding a go routine to constantly write to the console, these test now fail within a small number of iterations (on this branch). To see them ASAP, I'd recommend commenting out the other test cases and running the go_20 signalHandler tests in a loop to see the "bad g" failure.
We may not have a perfect understanding, but we believe that when a signal happens while we are on the switched stack (for pcre2), go tries to retrieve the "g" from the stack. It's not going to find "g" on the stack when we're currently on our own stack executing pcre2 code, so go crashes. Without the new go routine that writes to the console, the "bad g" happens very infrequently, but importantly "bad g" in the signalHandlerStatic tests has also been observed on branches that do not have the stack pool implementation here. To the best of our knowledge, we could see this any time a signal handler interrupts us while we're running our c code.
If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running. https://github.com/golang/go/blob/master/src/runtime/cgocall.go https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s
I've created a branch bug/1469-go-signals-arm that contains the go routine that writes to the console. This branch would be a good place to continue this investigation.
While working on it, I can confirm that the test reintroduced in 24e17d5 will pass if we opt out from all the assembly part and just return -1 without calling pcre2_match_wrapper in com.c for the aarch64 variant. One promising way that can be investigated is to manipulate values with mcontext as described in the MATCH_LIMIT option in https://www.pcre.org/current/doc/html/pcre2perform.html#TOC1 to test if a low value allows to passing test and resolve the problem with bad g. Another approach as John mentioned is described here:
If this description is correct, we think a possible solution is to mimic how go knows if c code is currently running. https://github.com/golang/go/blob/master/src/runtime/cgocall.go https://github.com/golang/go/blob/master/src/runtime/asm_arm64.s