vscode-go
vscode-go copied to clipboard
bug: debugger freezes when loading 1M goroutines
From https://github.com/microsoft/vscode-go/issues/2562 by @aarzilli:
Debug https://github.com/aarzilli/delve_client_testing/blob/master/20bigstuff/main.go.
I expected a slight slowdown but the debugger should keep working. What happend is that the debugger freezes slowing down the whole system. Possible causes include VSCode-go trying to read the 100MB string, 10 million stack frames or 1 million goroutines. Possibly all three.
@aarzilli: Can dlv on the command-line handle this, or is this issue specific to VS Code Go?
Yes, both the command line client and gdlv work with this. I haven't checked GoLand in a year and I don't remember if they fixed it. What this is doing is checking that reasonable limits are passed to ListGoroutines, Eval and Stacktrace. You will probably encounter problems with the fact that DAP doesn't have a way, in the protocol, to deal with too many goroutines.
I tried this wiht dlv cli. The debug session makes it to the breakpoint with no issues.
$ dlv debug issue129.go
Type 'help' for list of commands.
(dlv) c
about to stop
> main.done() ./issue129.go:20 (PC: 0x10c4d5f)
15: sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
16: sh.Data = slh.Data
17: sh.Len = slh.Len
18: fmt.Printf("about to stop\n")
19: runtime.Breakpoint()
=> 20: fmt.Printf("this\n")
21: fmt.Printf("should\n")
22: fmt.Printf("be\n")
23: fmt.Printf("fast\n")
24: }
25:
(dlv)
At that point if we issue the same commands that VS Code would automatically, I see the following behavior:
(dlv) goroutines
This slowly prints 10K goroutines at a time with long pauses in-between. I killed the process after getting to 150000 in about 5 minutes.
(dlv) print buf
[]uint8 len: 100000000, cap: 100000000, [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...+99999936 more]
This returns immediately. Only 64 bytes are loaded. Stack also prints quickly as it truncates.
(dlv) stack
0 0x00000000010c4d5f in main.done
at ./issue129.go:20
1 0x00000000010c4e5c in main.makeStack
at ./issue129.go:28
2 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
3 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
4 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
5 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
6 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
7 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
8 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
9 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
10 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
11 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
12 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
13 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
14 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
15 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
16 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
17 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
18 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
19 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
20 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
21 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
22 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
23 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
24 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
25 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
26 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
27 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
28 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
29 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
30 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
31 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
32 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
33 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
34 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
35 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
36 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
37 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
38 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
39 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
40 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
41 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
42 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
43 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
44 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
45 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
46 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
47 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
48 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
49 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
50 0x00000000010c4e77 in main.makeStack
at ./issue129.go:31
(truncated)
(dlv)
When running the same program in vscode:
about to stop
020-11-02T09:21:46-08:00 debug layer=rpc (async 3) -> rpc2.CommandOut{"State":{"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"file":"/usr/local/go/src/runtime/mgcmark.go","line":1260,"function":{"name":"runtime.scanobject","value":16908224,"type":0,"goType":0,"optimized":true},"goroutineID":1000018,"ReturnValues":null},{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},{"id":8099904,"pc":16862226,"file":"\u003cautogenerated\u003e","line":1,"function":{"name":"runtime.heapBits.bits","value":16862208,"type":0,"goType":0,"optimized":true},"goroutineID":1000049,"ReturnValues":null}],"NextInProgress":false,"exited":false,"exitStatus":0,"When":""}} error: ""
continue state {"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"[...]
handleReenterDebug(breakpoint).
2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{"Start":0,"Count":1})
2020-11-02T09:21:46-08:00 debug layer=rpc -> *rpc2.ListGoroutinesOut{"Goroutines":[{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""}],"Nextg":1} error: ""
To client: {"seq":0,"type":"event","event":"stopped","body":{"reason":"breakpoint","threadId":1,"allThreadsStopped":true}}
StoppedEvent("breakpoint")
From client: threads(undefined)
2020-11-02T09:21:46-08:00 debug layer=rpc (async 6) <- RPCServer.State(rpc2.StateIn{"NonBlocking":true})
2020-11-02T09:21:46-08:00 debug layer=rpc (async 6) -> rpc2.StateOut{"State":{"Running":false,"Recording":false,"currentThread":{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null},"currentGoroutine":{"id":1,"currentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"userCurrentLoc":{"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false}},"goStatementLoc":{"pc":17178566,"file":"/usr/local/go/src/runtime/asm_amd64.s","line":220,"function":{"name":"runtime.rt0_go","value":17178256,"type":0,"goType":0,"optimized":true}},"startLoc":{"pc":16998528,"file":"/usr/local/go/src/runtime/proc.go","line":113,"function":{"name":"runtime.main","value":16998528,"type":0,"goType":0,"optimized":true}},"threadID":8099903,"unreadable":""},"Threads":[{"id":8099904,"pc":16862226,"file":"\u003cautogenerated\u003e","line":1,"function":{"name":"runtime.heapBits.bits","value":16862208,"type":0,"goType":0,"optimized":true},"goroutineID":1000049,"ReturnValues":null},{"id":8099856,"pc":16861268,"file":"/usr/local/go/src/runtime/mbitmap.go","line":456,"function":{"name":"runtime.heapBits.next","value":16861216,"type":0,"goType":0,"optimized":true},"goroutineID":1000034,"ReturnValues":null},{"id":8099900,"pc":140735118227286,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099901,"pc":140735118227586,"file":"","line":0,"goroutineID":0,"ReturnValues":null},{"id":8099902,"pc":16908520,"file":"/usr/local/go/src/runtime/mgcmark.go","line":1260,"function":{"name":"runtime.scanobject","value":16908224,"type":0,"goType":0,"optimized":true},"goroutineID":1000018,"ReturnValues":null},{"id":8099903,"pc":17583455,"file":"/Users/polina/go/src/issue129/issue129.go","line":20,"function":{"name":"main.done","value":17583216,"type":0,"goType":0,"optimized":false},"goroutineID":1,"ReturnValues":null}],"NextInProgress":false,"exited":false,"exitStatus":0,"When":""}} error: ""
ThreadsRequest
To client: {"seq":0,"type":"event","event":"output","body":{"category":"stderr","output":"2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{\"Start\":0,\"Count\":0})\n"}}
2020-11-02T09:21:46-08:00 debug layer=rpc <- RPCServer.ListGoroutines(rpc2.ListGoroutinesIn{"Start":0,"Count":0})
Things get stuck where we request all the goroutines. This happens before loading the stacktrace or variables, so at least at this stage those request do not contribute to the issue.
Unlike StrackTrace Request,Threads Request does not offer any settings for partial retrieval. There is not a feature in the UI to retrieve more threads or a specific thread.
Realistically a user won't be able to examine 1M goroutines by hand, would they? So why don't we set the limit internally and only retrieve the selected goroutine and N more goroutines?
When truncating the threads list, we should signal to the user that this happened by writing to the DEBUG console with an OutputEvent. If paging is added to DAP( microsoft/debug-adapter-protocol#159) and then to vscode UI, we can revisit this design. Ideally the CALL STACK would display something like "showing N threads out of M" with a button to load more and/or a search field to look up an individual thread.
When trimming the list of threads to return, we should favor the threads that are stopped on breakpoints in user code over those in runtime.gopark because those are the ones the users might want to examine more closely, checking their stack and variables. This is related to #130 and #118.
Btw, reducing the number of goroutines in the repro program to something small, we can observe if 100MB string or 10 million stack frames cause any additional issues. They don't.
Even though we don't have supportsDelayedStackTraceLoading explicitly set, we are essentially supporting stackTrace paging by setting totalFrames to a number smaller than the length of the stack frames array in the response. StackTrace requests by default comes with 20 levels. And by default launch.json sets stackTraceDepth to 50, which we pass to delve to limit the number of stack frames we load. The UI then offers "Load All Stack Frames" link to issue a subsequent request.
In the log:
From client: stackTrace({"threadId":1,"startFrame":0,"levels":20})
To client: {"seq":0,"type":"response","request_seq":12,"command":"stackTrace","success":true,"body":{"totalFrames":51, "stackFrames":[...includes 20 frames...]}}
From client: stackTrace({"threadId":1,"startFrame":20,"levels":31})
As for a large "string", the current adapter supports maxArrayValues, 64 by default, so that's what is loaded.

Suggestion from @hyangah "Is it possible that in the threads request, DAP sends only the goroutine each underlying thread saw when it stops. And don't send any long standing, unscheduled goroutines".
Looks like we can either get this from looping over the threads and picking those with GoroutineID not 0 or alternatively looping over goroutines and picking those with an associated thread that was executing it (ThreadID not 0). I think this should be a superset of the goroutines stopped at breakpoints.
Only if we have an alternative way to present grouped, filtered, or a paged goroutines. :-( Otherwise, bugs like deadlock - so interesting goroutines are not on threads, etc will be impossible to debug. Let's see how the issue in the DAP repo moves.
It's true - displaying a subset could hide interesting goroutines. But at the same time, displaying 1M goroutines, even paged, won't be of much help either without additional searching or filtering features in the UI. It is not realistic to study them manually in order even if we support labels. Even if paging is added to DAP, we would still need to request UI changes for those in the editor.
How would you debug bugs like deadlock and such with delve cli? You will run into the same issues, won't you?
I guess we could do this with expression evaluation like we do with call, but that would be quite hacky. The backend function behind goroutines does support paging, but doesn't support filtering by label or criteria. The backend function behind goroutine only searches by id. We would have to load everything and search ourselves or update the backend to add advanced search features. The same functionality can then be reused if DAP+UI expose searching/filtering. But then we open the Pandora's box of supporting a custom request language via the expression prompt, hacking return values to print long multi-line results strings, not really something it was meant for.
update the backend to add advanced search features
this isn't a bad idea, in general.
The editor no longer freezers because we limit the number of goroutines and notify the usesr via DEBUG CONSOLE:
We do make sure the selected goroutine is included in the list.
To get to other interesting goroutines (e.g. running) the user might need to use dlv config goroutineFilters (dlv doc).
Currently there is no other way to alter the list returned by default to get to goroutines that did not make the cut.
We filed a paging issue in Nov 2020. That triggered a desire to investigate an alternative protocol & related vscode UI options. But there is no ETA for these improvements.
In the meantime, our options include:
- adding
maxGoroutinesto launch.json with a warning to users to increase at their own risk (#2041) - adding
dlv config maxGoroutinesalso with a warning - promoting
dlv config goroutineFiltersvia detailed documentation in vscode-go (expanding upon dlv documentation) - printing
Use 'dlv config goroutineFilters'to DEBUG CONSOLE when stopped - setting
dlv config goroutineFiltersto-w runningby default - adding more powerful filters via
dlv configourselves