delve
delve copied to clipboard
service/dap: fix closing closed channel bug
Our previous implementation of waiting on resumeRequestLoop was not safe, because the channel may be closed several times. This change removes this bug by instead having two channels to track the two conditions that we are managing, whether the command has started running and whether the command has completed (on error or other).
Fixes go-delve/delve#2757
So what changed? The auto-resume logic added signal redirection with an additional goroutine, so now we could have the new goroutine close the channel when Command signaled setup completion and the per-request goroutine close the same channel at the same time when Command quickly returned? Do I have that right?
Yes this does happen because of the auto resume logic. Since we are adding a separate goroutine to wait for resumeNotify, when we actually call s.debugger.Command, the original goroutine and that other goroutine could try to close the channel at the same time.
So what changed? The auto-resume logic added signal redirection with an additional goroutine, so now we could have the new goroutine close the channel when Command signaled setup completion and the per-request goroutine close the same channel at the same time when Command quickly returned? Do I have that right?
Yes this does happen because of the auto resume logic. Since we are adding a separate goroutine to wait for resumeNotify, when we actually call s.debugger.Command, the original goroutine and that other goroutine could try to close the channel at the same time.
The channel that is closed on two different goroutines is allowNextStateChange. At one point we had a discussion that with the introduction of changeStateMu we might not need to block the request loop on allowNextStateChange at all. Have you considered that approach instead of adding another channel?
Just curious on the status of this @polinasok @suzmue.
Should this PR still be considered for the next release?
@derekparker I haven't looked at this in a bit, so I will go ahead and rebase it.
I'm not sure it is a high priority that we need to get it in asap. I don't think we have seen any reports of users actually encountering this bug, so I think it is ok to push it to the next release so @polinasok can come back and review
@suzmue this still needs a rebase. Would we still like to try targeting this for 1.8.4?
@derekparker Thanks for bumping this. I rebased the change, but need to take a little bit of time to get the context back on this change to address the remaining comments. Will do this by the end of the week.
We just hit this again. I will take a look shortly.
@suzmue @polinasok is this change still active or should I close it out?