zinx icon indicating copy to clipboard operation
zinx copied to clipboard

之前死锁bug的修复引出的新死锁bug

Open graydovee opened this issue 3 years ago • 1 comments

之前死锁bug的修复 #131 引出的新死锁bug Router的回调会并发执行,且回调中有可能有Close connection, 与SendMsg的行为,若Close行为在SendMsg行为前发生,且SendMsg行为在finalize行为前发生,则会导致以下情况 SendMsg获取锁,尝试往msgChan发送消息,由于context已经被cancel(), StartWriter协程已被退出,不会继续消费msgChan的信息,导致SendMsg一直处于阻塞状态,此时fianlizer协程也因为锁被SendMsg方法持有,被阻塞,无法回收msgChan等资源,出现死锁

下面的测试用例有很大概率复现bug


type CloseConnectionBeforeSendMsgRouter struct {
	BaseRouter
}

type DemoPacket struct {
	DataPack
}

func (d *DemoPacket) Pack(msg ziface.IMessage) ([]byte, error) {
	time.Sleep(time.Second * 1)
	return d.DataPack.Pack(msg)
}

func (br *CloseConnectionBeforeSendMsgRouter) Handle(req ziface.IRequest) {
	connection := req.GetConnection()

	dp := &DemoPacket{}
	msg := "Zinx server response message for CloseConnectionBeforeSendMsgRouter"
	pack, _ := dp.Pack(NewMsgPackage(0, []byte(msg)))
	connection.Stop()
	_ = connection.SendMsg(1, pack)
	fmt.Println("send: ", msg)
}

func TestCloseConnectionBeforeSendMsg(t *testing.T) {
	s := NewServer()
	s.AddRouter(1, &CloseConnectionBeforeSendMsgRouter{})

	s.Start()
	time.Sleep(time.Second * 1)

	wg := sync.WaitGroup{}
	wg.Add(1)
	go func() {
		conn, _ := net.Dial("tcp", "127.0.0.1:8999")
		dp := NewDataPack()
		msg := "Zinx client request message for CloseConnectionBeforeSendMsgRouter"
		pack, _ := dp.Pack(NewMsgPackage(1, []byte(msg)))
		_, _ = conn.Write(pack)
		fmt.Println("send: ", msg)
		buffer := make([]byte, 1024)
		read, _ := conn.Read(buffer)
		fmt.Println("receive: ", string(buffer[:read]))
		wg.Done()
	}()
	wg.Wait()
	s.Stop()
}

修复建议有两种:

  1. sendMsg的锁粒度只锁isClosed的判断,但这会导致尝试往msgChan写入时msgChan已被关闭,出现panic, 需要recovery
  2. 干掉msgChan直接wrtiter写 #84 ,msgBuffChan可以保留,因为msgBuffChan有缓冲,不会阻塞写入,只要缓冲大小设置足够就很难出现死锁情况

讨论下哪种方案比较好,周末有空可以提个pr

graydovee avatar Feb 17 '22 03:02 graydovee

已merge PR https://github.com/aceld/zinx/pull/135

aceld avatar Feb 23 '22 07:02 aceld