delve Conditional breakpoints are slow

What version of Delve are you using (dlv version)?

Delve Debugger Version: 1.2.0 Build: ac3b1c7a786d681a5aefcdded9888090d69b3832

What version of Go are you using? (go version)?

go version go1.12.2 windows/amd64

What operating system and processor architecture are you using?

Windows 10 64bit AMD Ryzen 1800X

What did you do?

Put a conditional breakpoint inside an inner loop running millions of times. The condition is on the integer loop variable.

What did you expect to see?

I expected the loop run up to a few times slower due to the conditional breakpoint.

What did you see instead?

The loop runs about 1000x slower, which makes the conditional breakpoint useless if the loop count is higher than a few thousand.

A speedy workaround is to put the same condition as an if statement and an unconditional breakpoint inside it, but it is inconvenient, since needs recompilation and re-running the binary.

I am using delve via GoLand as the GUI. JetBrains tested my code (see below) via delve (without GoLand) and it had the same performance issue. So this is not due to the GUI integration.

Binary inside GoLand: %AppData%\Local\JetBrains\Toolbox\apps\Goland\ch-0\191.6183.86\plugins\go\lib\dlv\windows>dlv

How to reproduce:

package main

import ( 
    "fmt" 
    "time" 
)

func main() { 
    sum := int64(0) 
    start := time.Now() 
    for value := int64(0); value < 10000000; value++ { 
        sum += value 
    } 
    elapsed := time.Since(start) 
    fmt.Printf("Sum: %d\nTook %s", sum, elapsed) 
}

Run without debugging:

Sum: 49999995000000 Took 6.0053ms

Run with debugger, without conditional breakpoint:

Sum: 49999995000000 Took 24.0209ms

Put breakpoint with condition value == -1 on code line sum += value.

Debug run with conditional breakpoint:

Paused it after a few minutes and checked the current value. It got only 36k iterations far.

So I slashed the loop count by 1000 from 10M to 10000 and run it again:

Sum: 49995000 Took 32.4708425s

Based on the above result 10M iterations would have taken 32471 seconds (9 hours). More than 1000 times slowdown due to the presence of the conditional breakpoint.

This is why conditional breakpoints cannot be used conveniently inside inner loops right now, because they are plainfully slow.

Workaround is to add the condition to the code as an if statement and put a breakpoint inside it:

if value == -1 { 
    fmt.Print("Put breakpoint here") 
}

It is barely slower than running the original code in debug mode:

Sum: 49999995000000 Took 28.0243ms

There must be some crazy high overhead of calling out to check those breakpoint conditions. I guess it calls out to the debugger, extracts the variable values from the stack / heap, then somehow interprets the condition there.

Would it be possible to apply the above workaround automatically at least if such a conditional breakpoint is set before starting the debugged application? It would help a lot with catching corner cases in inner loops and analyzing them in their context.

Maybe there is a way to add a call between each pair of instructions, allowing the debugger to inject conditional checks there while the code is already running. Or some "NOP" instructions which can be replaced later at runtime with a debug trap. There must be a solution, in case of C/C++ it had good performance.

I understand if this is not possible, then I just stick with my workaround.

May 13 '19 23:05 viktor-ferenczi

Experiencing this too. Is there any known workaround that does not require re-compilation?

Oct 28 '19 19:10 redlus

A couple of considerations after looking into this a bit. The process of:

stopping at a breakpoint
evaluating a simple condition
resuming the target process

currently takes us 2.2ms (all measures taken on my laptop running linux). This is consistent with the observations on this message (it would take around 30s to do that 10000 times). Using a toy debugger to do the same thing takes 0.08ms, however that's unrealistic because it doesn't have to evaluate an expression and doesn't deal with multiple threads properly. A more fair comparison is gdb which (experimentally) takes 0.18ms to do it, or a little bit over 10x faster than delve.

I have a series of patches that takes our latency down from 2.2ms to 0.6ms, I think there's probably still some room. Doing this for Go is harder than doing this for C so a goal of 0.22ms latency is probably realistic.

Note that the original goal, with 10M iterations, would still take 30 minutes even with gdb.

Jan 29 '20 08:01 aarzilli

Does Go (delve) allow for runtime code changes while being stopped at a breakpoint?

(Restricted code modification without changing the code structure, functions, structs or variables. There is a similar feature for C# and I remember using such a feature of Visual C++ more than 10 years ago.)

Such a feature could be used to inject an if statement with the condition and a debugger stop statement at runtime. It would have the highest performance possible due to the very minimal overhead.

Injecting an if statement would also be possible before starting the debug execution, even if runtime code changes are not supported. Conditional breakpoints defined this way would have minimal impact on performance, so at least we would have an option.

Jan 29 '20 14:01 viktor-ferenczi

Well, "allow" is a strong word, but in theory something like this could be possible.

(Note I haven't thought this all the way through)

I suppose it could be possible to try and allocate memory in the target process and write some generated instructions there which evaluate some condition and hit a breakpoint if true. Then instead of writing a breakpoint we insert a trampoline to the area of memory we just wrote the eval code to. This would avoid a context switch if the condition was false. Not totally unheard of (RR does this with certain syscalls where it has pre-generated stubs to avoid a trap context switch).

Jan 29 '20 15:01 derekparker

There's actually a paper describing that idea. There are two problems with that, the first one is that starting with Go 1.14 we can't inject code like that anymore, the second problem is that a 64bit absolute jump in amd64 takes up a massive 15 bytes so it won't fit over most instructions, which creates a lot of problems.

Jan 29 '20 15:01 aarzilli

Any news on this issue?

Feb 26 '22 00:02 arthurlopes

All reasonable optimization that could be done about this was done at the time. We may revisit this issue in the future to assess if there has been any slippage. The remaining slowness is either inherent to the mechanism used to implement conditional breakpoints (ptrace &c) or caused by #21827 (see #49848 for a longer explanation).

Mar 09 '22 10:03 aarzilli

delve delve copied to clipboard

Conditional breakpoints are slow

delve
delve copied to clipboard