go-ethereum
go-ethereum copied to clipboard
p2p: race condition in dialer scheduler
WARNING: DATA RACE
Read at 0x00c03c44d2a0 by goroutine 152:
github.com/ethereum/go-ethereum/p2p.(*dialScheduler).updateStaticPool()
github.com/ethereum/go-ethereum/p2p/dial.go:433 +0x2e4
github.com/ethereum/go-ethereum/p2p.(*dialScheduler).expireHistory.func1()
github.com/ethereum/go-ethereum/p2p/dial.go:378 +0xcf
github.com/ethereum/go-ethereum/p2p.(*expHeap).expire()
github.com/ethereum/go-ethereum/p2p/util.go:59 +0x10a
github.com/ethereum/go-ethereum/p2p.(*dialScheduler).expireHistory()
github.com/ethereum/go-ethereum/p2p/dial.go:375 +0x124
github.com/ethereum/go-ethereum/p2p.(*dialScheduler).loop()
github.com/ethereum/go-ethereum/p2p/dial.go:308 +0x239a
github.com/ethereum/go-ethereum/p2p.newDialScheduler.func2()
github.com/ethereum/go-ethereum/p2p/dial.go:186 +0x53
github.com/panjf2000/ants/v2.(*goWorker).run.func1()
github.com/panjf2000/ants/[email protected]/worker.go:70 +0xd9
Previous write at 0x00c03c44d2a0 by goroutine 1038:
github.com/ethereum/go-ethereum/p2p.(*dialTask).resolve()
github.com/ethereum/go-ethereum/p2p/dial.go:537 +0x204
github.com/ethereum/go-ethereum/p2p.(*dialTask).run()
github.com/ethereum/go-ethereum/p2p/dial.go:498 +0xf9
github.com/ethereum/go-ethereum/p2p.(*dialScheduler).startDial.func1()
github.com/ethereum/go-ethereum/p2p/dial.go:465 +0x58
github.com/panjf2000/ants/v2.(*goWorker).run.func1()
github.com/panjf2000/ants/[email protected]/worker.go:70 +0xd9
off of latest master -
Hmm that is not latest master afaict. Could you link the commit you're using?
I fixed some races here: https://github.com/ethereum/go-ethereum/pull/23434 Don't think it's the one you found though
The race in this issue is on t.dest. When resolve() runs, it may update t.dest. Meanwhile, on the dialer loop, we run updateStaticPool when a peer is removed, and this reads t.dest.
My bad , was latest master off bsc and should have written the description @fjl wrote , yes that’s the problem -
I'm actually not sure how to fix this race correctly. The check on t.dest in updateStaticPool seems off to me, maybe that should just be removed.
Agree - it wasn’t clear to me how to fix it immediately either , and don’t want locks for just this - if node being removed anyway then .dest being updated shouldn’t be needed then
This issue has become quite stale now. I can't reproduce the race condition anymore. Feel free to reopen if this is still an issue with the current master.
I'm sure this race is not fixed. It's just hard to reproduce because it requires resolving the node, which takes variable time.
@fjl how about solving it by just making it an atomic.Pointer? Like this: https://github.com/holiman/go-ethereum/commit/79b76150f0bbef7faa4e45b299ada41bb792b5d8
Yeah I guess this would be a possible fix. Though I think there must be a better way to fix it.
The main issue is really just that the main loop wants access to the node ID, and it reads that through .dest.ID(). So another fix would be to store a separate, immutable reference to the original non-resolved *enode.Node and use that to read the ID.