truechain-consensus-core
truechain-consensus-core copied to clipboard
fix some issue that some var use without lock
解决了以下的问题:
- 一些变量读写并没有加锁保护
- 一些地方出现error,应该return而没有return
- 建议用同一的方法打日志,而不是fmt.Printf和自定义方法MyPrint混用(可能还存在其他地方)
一些建议:
- node那个struct由于有好几个属性变量,但是node只有一个lock,是否可以按照成员属性变量对于有一个锁,进而提高效率?
- 建议对属性变量的修改,涉及到锁保护的,都封装为函数,尽可能很简短的函数。函数里面统一加上mu.lock()和defer mu.Unlock(),否则非常容易出现死锁,尤其在node.go那个文件,里面有些函数比较长的情况。
let me take a look. Could you please use English for future communication? thanks.
ok.
oh just checked, once in about 5 times, for i in {1..5}; do go build engine.go && ./engine ; done ..this would still error out with:
...
<snip>
...
goroutine 1535 [semacquire]:
sync.runtime_SemacquireMutex(0xc42021447c, 0xc420374500)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc420214478)
/usr/local/go/src/sync/mutex.go:134 +0x108
_/workspace/truechain-consensus-core/pbft-core.(*Node).incCommDict(0xc420214400, 0xc420899380, 0x80)
/workspace/truechain-consensus-core/pbft-core/node.go:690 +0x3a
_/workspace/truechain-consensus-core/pbft-core.(*Node).processCommit(0xc420214400, 0x1, 0x1b, 0x0, 0x2, 0xc420899380, 0x80, 0xffffffffffffffff, 0xc420899400, 0x80, ...)
/workspace/truechain-consensus-core/pbft-core/node.go:838 +0xa3
_/workspace/truechain-consensus-core/pbft-core.(*Node).ProxyProcessCommit(0xc420214400, 0x1, 0x1b, 0x0, 0x2, 0xc420899380, 0x80, 0xffffffffffffffff, 0xc420899400, 0x80, ...)
/workspace/truechain-consensus-core/pbft-core/node.go:456 +0x156
reflect.Value.call(0xc42012cc60, 0xc420248010, 0x13, 0x138a9ed, 0x4, 0xc420239f18, 0x3, 0x3, 0xc4202f3e00, 0x0, ...)
/usr/local/go/src/reflect/value.go:447 +0x969
reflect.Value.Call(0xc42012cc60, 0xc420248010, 0x13, 0xc42077af18, 0x3, 0x3, 0x0, 0x0, 0x0)
/usr/local/go/src/reflect/value.go:308 +0xa4
net/rpc.(*service).call(0xc420242040, 0xc420244000, 0xc4200182e0, 0xc4200182f0, 0xc42024a100, 0xc4202b1b60, 0x1334280, 0xc4208803c0, 0x199, 0x12f6400, ...)
/usr/local/go/src/net/rpc/server.go:384 +0x14e
created by net/rpc.(*Server).ServeCodec
/usr/local/go/src/net/rpc/server.go:480 +0x43a
i'll take a stab at it! https://github.com/truechain/truechain-consensus-core/blob/devel/pbft-core/node.go#L504-L507
which branch is it? i can not match the code.In my env,i just run go build engine.go && ./engine, it is ok.
@iamyh yea but did you run for i in {1..5}; do go build engine.go && ./engine ; done and see if it fails even once? I've checked out your PR locally and ran this.
included in https://github.com/truechain/truechain-consensus-core/commit/50cfb07f880c04dd42d1e06146e9ed7a5870380b while merging devel.
Please, let's rebase and continue the discusison
@arcolife hey,when i run go build engine.go and for i in {1..5}; do ./engine>> log 2>&1 ; with master branch,it will also come out 'goroutine stack error'.
when i search log file with keyword goroutine, it show 'concurrent map writes'. it means there is so much bug in node.go,such as
func (nd *Node) newClientRequest(req Request, clientId int) { // TODO: change to single arg and single reply
if v, ok := nd.active[req.Dig]; ok {
should be locked like:
var v ActiveItem
var ok bool
nd.mu.Lock()
v, ok = nd.active[req.Dig]
nd.mu.Unlock()
there are so many code without lock.please check it carefully.because node only has one lock mu,it will result in performance down,that is why i advise use muti lock and readwrite lock。
@iamyh , hi , could you please try with sync.Map to replace the nd.active map. https://golang.org/pkg/sync.
please make sure your golang version is above 1.9.
@hixichen my go version is go version go1.9.2 darwin/amd64
@iamyh Hi could you please join the invite from https://gitter.im/truechain-net/engg-foss-global thanks