dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Print stack trace for segfaults on linux

Open dkorpel opened this issue 2 years ago • 8 comments

https://forum.dlang.org/post/[email protected]

I don't know how to test this, because it requires the test to actually segfault.

dkorpel avatar Jun 17 '23 21:06 dkorpel

Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15331"

dlang-bot avatar Jun 17 '23 21:06 dlang-bot

Should be fine to test with a .sh

maxhaton avatar Jun 17 '23 23:06 maxhaton

I don't think it is a good idea to enable this by default. These signal handlers are generally iffy and this one in particular worries me in a few ways:

  1. What happens if there's multiple threads and one of them triggers the fault while the others are running? Signal handlers, generally speaking, can run in any random thread that didn't mask it out. Since it uses all gshared stuff my thought is it'd at least work as intended - the assert error is thrown from the fault site and that either terminates the thread (or is caught and recovers) and the other threads keep executing - but you'd want to confirm it.

  2. What happens if you run the program in a debugger, or on a distro with automatic crash reporting? This would likely cause a significant usability regression in that case, but this is of course mitigated by always using --DRT-trapExceptions=0 in debuggers anyway. We'd need a way to turn it back off (tho of couse setting the default signal handler back would do it, having a function in the lib to do that for you and documenting it would probably be good)

  3. What if the fault occurs inside a C function? I again expect it to work in practice but would like confirmation.

adamdruppe avatar Jun 17 '23 23:06 adamdruppe

Thanks, these are the critical questions I was looking for.

These signal handlers are generally iffy and this one in particular worries me in a few ways:

You obviously shouldn't get a segfault in the first place, but if one happens, how would this signal handler be worse than the default behavior?

What happens if there's multiple threads and one of them triggers the fault while the others are running?

I'll test this

or is caught and recovers

Hopefully not, segfaults shouldn't be recovered from

What happens if you run the program in a debugger

I tried with gdb, and it still catches the SIGSEGV itself

Program received signal SIGSEGV, Segmentation fault.
_Dmain () at ../test/test_.d:9
9           int a = *p; // segmentation fault: null pointer read/write operation

or on a distro with automatic crash reporting

I don't know how those work, but if they recognize crashes caused by range violations or other D assert trips, it will also work with this signal handler, and if they don't, then they don't work with D programs in general.

having a function in the lib to do that for you and documenting it would probably be good

Yes, I can add a deregisterMemoryErrorHandler2 function as well.

What if the fault occurs inside a C function? I again expect it to work in practice but would like confirmation.

I tried separately compiling an endlessly recursing c function with gcc, it works as expected.

dkorpel avatar Jun 18 '23 11:06 dkorpel

I've not enabled it by default anymore, and added a test.

I've tested the handler in a separate thread, and it works for null pointers, but stack overflow doesn't. I think the alt stack only works for the thread you call the register function on.

dkorpel avatar Mar 21 '24 20:03 dkorpel

Another lament against the memoryerror module is it's not portable across compiler vendors, let alone architectures.

ibuclaw avatar Mar 22 '24 06:03 ibuclaw

Another lament against the memoryerror module is it's not portable across compiler vendors, let alone architectures.

Can you not even import the module in gdc/ldc? The new handler doesn't use inline Assembly so it should be more portable, as long as you get the right stack pointer register from context.uc_mcontext.gregs

dkorpel avatar Mar 22 '24 08:03 dkorpel

Another lament against the memoryerror module is it's not portable across compiler vendors, let alone architectures.

Can you not even import the module in gdc/ldc? The new handler doesn't use inline Assembly so it should be more portable, as long as you get the right stack pointer register from context.uc_mcontext.gregs

Checked against your branch, and the answer is still no.

druntime/src/etc/linux/memoryerror.d:233:13: error: undefined identifier ‘naked’
  233 |             naked;
      |             ^
druntime/src/etc/linux/memoryerror.d:237:18: error: found ‘RBP’ when expecting ‘:’
  237 |             push RBP;
      |                  ^
druntime/src/etc/linux/memoryerror.d:238:17: error: found ‘RBP’ when expecting ‘:’
  238 |             mov RBP, RSP;
      |                 ^
druntime/src/etc/linux/memoryerror.d:238:20: error: expected constant string constraint for operand, not ‘,’
  238 |             mov RBP, RSP;
      |                    ^
druntime/src/etc/linux/memoryerror.d:240:17: error: found ‘sigsegvDataHandler’ when expecting ‘:’
  240 |             jmp sigsegvDataHandler;
      |                 ^

Just the tip of the wall of errors, but all asm-related.

ibuclaw avatar Mar 22 '24 22:03 ibuclaw