zinx
zinx copied to clipboard
之前死锁bug的修复引出的新死锁bug
之前死锁bug的修复 #131 引出的新死锁bug Router的回调会并发执行,且回调中有可能有Close connection, 与SendMsg的行为,若Close行为在SendMsg行为前发生,且SendMsg行为在finalize行为前发生,则会导致以下情况 SendMsg获取锁,尝试往msgChan发送消息,由于context已经被cancel(), StartWriter协程已被退出,不会继续消费msgChan的信息,导致SendMsg一直处于阻塞状态,此时fianlizer协程也因为锁被SendMsg方法持有,被阻塞,无法回收msgChan等资源,出现死锁
下面的测试用例有很大概率复现bug
type CloseConnectionBeforeSendMsgRouter struct {
BaseRouter
}
type DemoPacket struct {
DataPack
}
func (d *DemoPacket) Pack(msg ziface.IMessage) ([]byte, error) {
time.Sleep(time.Second * 1)
return d.DataPack.Pack(msg)
}
func (br *CloseConnectionBeforeSendMsgRouter) Handle(req ziface.IRequest) {
connection := req.GetConnection()
dp := &DemoPacket{}
msg := "Zinx server response message for CloseConnectionBeforeSendMsgRouter"
pack, _ := dp.Pack(NewMsgPackage(0, []byte(msg)))
connection.Stop()
_ = connection.SendMsg(1, pack)
fmt.Println("send: ", msg)
}
func TestCloseConnectionBeforeSendMsg(t *testing.T) {
s := NewServer()
s.AddRouter(1, &CloseConnectionBeforeSendMsgRouter{})
s.Start()
time.Sleep(time.Second * 1)
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
conn, _ := net.Dial("tcp", "127.0.0.1:8999")
dp := NewDataPack()
msg := "Zinx client request message for CloseConnectionBeforeSendMsgRouter"
pack, _ := dp.Pack(NewMsgPackage(1, []byte(msg)))
_, _ = conn.Write(pack)
fmt.Println("send: ", msg)
buffer := make([]byte, 1024)
read, _ := conn.Read(buffer)
fmt.Println("receive: ", string(buffer[:read]))
wg.Done()
}()
wg.Wait()
s.Stop()
}
修复建议有两种:
- sendMsg的锁粒度只锁isClosed的判断,但这会导致尝试往msgChan写入时msgChan已被关闭,出现panic, 需要recovery
- 干掉msgChan直接wrtiter写 #84 ,msgBuffChan可以保留,因为msgBuffChan有缓冲,不会阻塞写入,只要缓冲大小设置足够就很难出现死锁情况
讨论下哪种方案比较好,周末有空可以提个pr
已merge PR https://github.com/aceld/zinx/pull/135