go icon indicating copy to clipboard operation
go copied to clipboard

runtime: support SEH stack unwinding on Windows

Open qmuntal opened this issue 2 years ago • 6 comments

Background

Go binaries can't be reliably debugged or profiled with native Windows tools, such as WinDbg or the Windows Performance Analyzer, because Go does not generate PE files which contains the necessary static data that Win32 functions like RtlVirtualUnwind and StackWalk use to unwind the stack.

Delve and go tool pprof are great tools for developing on Windows, but production environments that run on Windows tend to rely on language-agnostic tools provided by Microsoft for profiling and troubleshooting. Stack unwinding is such a fundamental thing for all these tools, and Go not supporting it is a major pain point in the Windows ecosystem, at least when running production workloads.

Proposal

The Go compiler and linker should emit the necessary SEH static data for each Go function to reliably unwind and walk the stack using the Windows stack unwind conventions for each architecture:

  • amd64: https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64#unwind-data-for-exception-handling-debugger-support
  • arm64: https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling#arm64-exception-handling-information
  • arm: https://learn.microsoft.com/en-us/cpp/build/arm-exception-handling#arm-exception-handling-1
  • 386: there is no official way to do stack unwinding on x86, tools use obscure heuristics sometimes helped by data in the PDBs. Let's exclude windows/386 from this proposal.

This new information will slightly increase the size of the final binary, around 20-30 bytes per non-leaf functions. It could be discarded by passing the -s flag to the linker, just as it is already happening with DWARF data.

Stack unwinding overview

Note: each architecture has slightly different design, the following explanation is based on x64.

Stack unwinding normally take place in these three cases:

  • When a hardware or software exception is raised, the SEH mechanism will chime in to find an appropriate exception handler for it. It will first look in the registered vector exceptions handlers, and if none trap the exception, the stack will be unwind looking for a function that has a registered SEH exception handler than traps the exception. This last part of the process internally calls RtlVirtualUnwind.
  • Debuggers and profiles directly call RtlVirtualUnwind to unwind the stack, i.e. #57404
  • When writing minidump files, i.e. with MinidumpWriteDump, which also calls RtlVirtualUnwind. Notice that Go will support creating minidumps on crash once #49471 is implemented.

RtlVirtualUnwind unwinds exactly one frame from the stack and has two important parameters: ControlPC and FunctionEntry. The former is the PC from where to start the unwinding, and the later is the frame information of the current function. This frame information is what comes from the static data in the PE files, more specifically from the .pdata and .xdata sections. It contains the following bits: function length, prolog length, frame pointer location (if used), where does the stack grow to accommodate local variable, how to restore non-volatile registers, and the exception handler address (if any). RtlVirtualUnwind will use this information to restore the context of the calling function without physically walking the stack. If this information is not present (current situation in Go binaries), it will naively take the return address from the last 4/8 bytes of the stack, which really only works for leaf functions, and for non-leaf functions it means that the return address points to whatever value the last local variable happens to contain.

There is one important outcome of this explanation: tools using RtlVirtualUnwind will unwind Go binaries even if no unwind information is present in the PE (current situation), this process will never work correctly unless unwinding a leaf function. So, whatever we do, even if not perfect, will be an improvement over the current situation.

Implementation

I would rather keep the implementation details out of this discussion, it is doable and there any many ways to implement it, from naively generating the info in the linker (see prototype CL 457455) to a detailed stack frame representation generated by the compiler and the linker.

If this proposal is accepted, I would suggest implementing it incrementally, starting by just enabling stack walking and finishing with an accurate representation of the non-volatile registries at every frame of the call stack.

@golang/windows @golang/compiler

qmuntal avatar Dec 14 '22 08:12 qmuntal

I don't see any API changes here, so I don't think this needs to go through the proposal process. I think the runtime team and the Windows team can make a decision here. Thanks.

ianlancetaylor avatar Dec 14 '22 17:12 ianlancetaylor

CC @golang/runtime @golang/windows

ianlancetaylor avatar Dec 14 '22 17:12 ianlancetaylor

I don't see any API changes here, so I don't think this needs to go through the proposal process. I think the runtime team and the Windows team can make a decision here. Thanks.

Makes sense.

Side note: I open to lead this implementation as part of my job at Microsoft, I don't expect nor ask the Go compiler/runtime team to jump into this by their own.

qmuntal avatar Dec 14 '22 20:12 qmuntal

I think we should accept these changes.

@qmuntal it would help if you create new issues demonstrating problems you intend to fix. You mention #49471 , but I suspect there are others. Broken WinDbg or Windows Performance Analyzer or other tools are OK for these issues. It would help Go team to make this decision once they see what they are getting for adding new code.

Thank you.

Alex

alexbrainman avatar Dec 20 '22 07:12 alexbrainman

@qmuntal it would help if you create new issues demonstrating problems you intend to fix. You mention #49471 , but I suspect there are others. Broken WinDbg or Windows Performance Analyzer or other tools are OK for these issues. It would help Go team to make this decision once they see what they are getting for adding new code.

That's a very good advice. Will do that!

qmuntal avatar Dec 20 '22 10:12 qmuntal

Change https://go.dev/cl/459395 mentions this issue: runtime: use explicit NOFRAME on windows/amd64

gopherbot avatar Dec 23 '22 12:12 gopherbot

Change https://go.dev/cl/461736 mentions this issue: cmd/internal/objabi,runtime: record NoFrame func flag

gopherbot avatar Jan 12 '23 09:01 gopherbot

Change https://go.dev/cl/461737 mentions this issue: cmd/link,cmd/internal/objabi: support ADDR32NB relocations on windows

gopherbot avatar Jan 12 '23 10:01 gopherbot

Change https://go.dev/cl/461738 mentions this issue: cmd/link: generate .pdata PE section

gopherbot avatar Jan 12 '23 15:01 gopherbot

Change https://go.dev/cl/461749 mentions this issue: cmd/internal/obj: generate SEH aux symbols for windows/amd64

gopherbot avatar Jan 16 '23 15:01 gopherbot

Change https://go.dev/cl/457455 mentions this issue: cmd/link: generate .xdata PE section

gopherbot avatar Jan 18 '23 14:01 gopherbot

@aclements @cherrymui @alexbrainman the work to add SEH unwinding for windows/amd64 is ready to review. Do you expect to have time to review it during the go1.21 development cycle?

FYI: I finally managed to add all the necessary unwind information to the PE binary with just a 0.7% increment of binary size. This increment is compensated by CL 462300 and CL 463845, which reduced the binary size by about 0.8%.

qmuntal avatar Feb 01 '23 13:02 qmuntal

Thanks for the contribution! It is nice that it only causes a small binary size increase. Yeah, I'll review during the 1.21 cycle (probably not very soon, though). Thanks.

cherrymui avatar Feb 01 '23 17:02 cherrymui

1 month till freeze, but just 2 CLs for review remaining. Could you take a look to CL 461738 and CL 457455 @cherrymui and @thanm? Thanks!

qmuntal avatar Apr 20 '23 12:04 qmuntal

Change https://go.dev/cl/499515 mentions this issue: doc: document WER and SEH changes in Go 1.21

gopherbot avatar May 31 '23 13:05 gopherbot

Thanks for all of this work @qmuntal ! Is there more to be done here, or can we close this issue at this point?

aclements avatar Jun 06 '23 15:06 aclements

Thanks for all of this work @qmuntal ! Is there more to be done here, or can we close this issue at this point?

I haven't close it yet because SEH is still not implemented on windows/arm64, but I'm fine closing this bigger issue and open a more specific follow up issue. What do you recommend?

qmuntal avatar Jun 06 '23 16:06 qmuntal

@qmuntal I think we just kick it to the next milestone and update the original post; possibly also the title.

mknyszek avatar Jun 06 '23 16:06 mknyszek

While investigating #62887 I found out that, on windows/amd64 using internal linking, the Go linker doesn't merge .pdata/.xdata sections from host object files generated by the C compiler. This means that the stack can't be unwind in C -> Go transitions. I'm preparing a fix for that.

qmuntal avatar Oct 11 '23 12:10 qmuntal

Change https://go.dev/cl/534555 mentions this issue: cmd/internal/link: merge .pdata and .xdata sections from host object files

gopherbot avatar Oct 11 '23 15:10 gopherbot

Hi @qmuntal , what is the status for this now? Is there more to do? Should we bump to Go 1.23? Thanks.

cherrymui avatar Nov 27 '23 22:11 cherrymui

@qmuntal Also, is there anything that worth mentioning in the Go 1.22 release notes? Thanks.

cherrymui avatar Nov 28 '23 20:11 cherrymui

Hi @qmuntal , what is the status for this now? Is there more to do? Should we bump to Go 1.23? Thanks.

windows/arm64 work still missing. I'll probably won't have bandwidth to implement it for Go 1.23, so we can move it to the backlog.

Also, is there anything that worth mentioning in the Go 1.22 release notes?

Yep, good point. CL 534555 and CL 525475, which should have linked this issue, deserve to be part of the release notes, as they can sublety alter program behavior (in a good way). I'll submit the corresponding release notes CL.

qmuntal avatar Nov 28 '23 21:11 qmuntal

Friendly ping, just checking to see if you have had a chance to look into writing a release notes CL. Thanks.

thanm avatar Dec 07 '23 15:12 thanm

Change https://go.dev/cl/548575 mentions this issue: doc: document SEH changes

gopherbot avatar Dec 09 '23 00:12 gopherbot