go-spacemesh icon indicating copy to clipboard operation
go-spacemesh copied to clipboard

beacon: Flaky ProtocolDriver.Close - might panic

Open fasmat opened this issue 1 year ago • 1 comments

(*ProtocolDriver) Close() closes the internal results channel. It appears that this can happen before all running go routines are done writing to it. See this CI run as an example: https://github.com/spacemeshos/go-spacemesh/actions/runs/6536347132/job/17747858362

=== FAIL: node TestSpacemeshApp_NodeService (unknown)
panic: send on closed channel

goroutine 1864 [running]:
github.com/spacemeshos/go-spacemesh/beacon.(*ProtocolDriver).onResult(0xc00001a5a0, 0x2, {0xe3, 0xe3, 0x38, 0x9e})
	/Users/runner/work/go-spacemesh/go-spacemesh/beacon/beacon.go:262 +0x8a
github.com/spacemeshos/go-spacemesh/beacon.(*ProtocolDriver).UpdateBeacon(0xc00001a5a0, 0x1b57c?, {0xe3, 0xe3, 0x38, 0x9e})
	/Users/runner/work/go-spacemesh/go-spacemesh/beacon/beacon.go:243 +0x36e
github.com/spacemeshos/go-spacemesh/node.(*App).startServices.(*App).listenToUpdates.func7()
	/Users/runner/work/go-spacemesh/go-spacemesh/node/node.go:1026 +0x405
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/Users/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x77
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1691
	/Users/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:72 +0x125

fasmat avatar Oct 17 '23 13:10 fasmat

Happened again here: https://github.com/spacemeshos/go-spacemesh/actions/runs/6981542100/job/18998899931

fasmat avatar Nov 24 '23 14:11 fasmat

First time I have seen it happening to a user: https://discord.com/channels/623195163510046732/691261331382337586/1217895028894990386

fasmat avatar Mar 14 '24 18:03 fasmat

Yes, it's caused by:

  • When you run two nodes on the same system, both smeshing
  • you DO NOT change grpc-post-listener to non conflicting values

Then, when you start the node, you load the beacon, and at the same time, the node shuts down because of a communication issue with post-service.

pigmej avatar Mar 15 '24 09:03 pigmej

This was fixed in #5707

fasmat avatar Apr 02 '24 07:04 fasmat