rolling-shutter icon indicating copy to clipboard operation
rolling-shutter copied to clipboard

Fix flakey `p2p.TestStartNetworkNodeIntegration` test

Open ezdac opened this issue 3 years ago • 1 comments

Sometimes the TestStartNetworkNodeIntegration fails with a slice out of bounds error in the gossipsub-router

=== FAIL: p2p TestStartNetworkNodeIntegration (unknown)
     INF [          p2p.go:146] dropping message, not subscribed to topic topic=testTopic1
     INF [          p2p.go:169] created libp2p host address=/ip4/127.0.0.1/tcp/2001/p2p/12D3KooWFhf64KBUDXZozUmX5yyzexGLqcfgSdUX37GiWEkh9LjW
     INF [          p2p.go:169] created libp2p host address=/ip4/127.0.0.1/tcp/2000/p2p/12D3KooWFG7sWvyzsovbUbqPySMAR7UoohJZSMwwX6oMkwXHkNam
     INF [          p2p.go:169] created libp2p host address=/ip4/127.0.0.1/tcp/2002/p2p/12D3KooWMr6tcH2mFL3GgGUmF2296GiYRabTK49Pk8B6n8vqwN6W
     INF [          p2p.go:169] created libp2p host address=/ip4/127.0.0.1/tcp/2003/p2p/12D3KooWDPtr6wENaiD2cxJ9wnDDX9Be79oj5JST2bdu4r6Vw3bf
     ERR [     bootstrap.go:43] couldn't connect to boostrap node error="failed to find peers: failed to find any peer in table" peer="{12D3KooWMr6tcH2mFL3GgGUmF2296GiYRabTK49Pk8B6n8vqwN6W: [/ip4/127.0.0.1/tcp/2002]}"
     ERR [     bootstrap.go:43] couldn't connect to boostrap node error="failed to find peers: failed to find any peer in table" peer="{12D3KooWDPtr6wENaiD2cxJ9wnDDX9Be79oj5JST2bdu4r6Vw3bf: [/ip4/127.0.0.1/tcp/2003]}"
     DBG [     bootstrap.go:88] called retriable function error="could not connect to any bootstrap node" count=1 duration=27.093565 funcName=]
     ERR [     bootstrap.go:43] couldn't connect to boostrap node error="failed to find peers: failed to find any peer in table" peer="{12D3KooWDPtr6wENaiD2cxJ9wnDDX9Be79oj5JST2bdu4r6Vw3bf: [/ip4/127.0.0.1/tcp/2003]}"
     DBG [     bootstrap.go:77] called retriable function error="could not connect to any bootstrap node" count=1 duration=22.686801 funcName=]
     ERR [     bootstrap.go:43] couldn't connect to boostrap node error="failed to find peers: failed to find any peer in table" peer="{12D3KooWDPtr6wENaiD2cxJ9wnDDX9Be79oj5JST2bdu4r6Vw3bf: [/ip4/127.0.0.1/tcp/2003]}"
panic: runtime error: slice bounds out of range [4:1]

goroutine 17365 [running]:
github.com/libp2p/go-libp2p-pubsub.(*GossipSubRouter).heartbeat(0xc0031e4960)
	/Users/ezdac/.asdf/installs/golang/1.20.1/packages/pkg/mod/github.com/libp2p/[email protected]/gossipsub.go:1441 +0x2c0a
github.com/libp2p/go-libp2p-pubsub.(*PubSub).processLoop(0xc004dc7440, {0x101b645b8, 0xc003ac96d0})
	/Users/ezdac/.asdf/installs/golang/1.20.1/packages/pkg/mod/github.com/libp2p/[email protected]/pubsub.go:651 +0x113b
created by github.com/libp2p/go-libp2p-pubsub.NewPubSub
	/Users/ezdac/.asdf/installs/golang/1.20.1/packages/pkg/mod/github.com/libp2p/[email protected]/pubsub.go:334 +0x1bce

Where the relevant slice has something to do with the connected peers and it's score:

// We keep the first D_score peers by score and the remaining up to D randomly
// under the constraint that we keep D_out peers in the mesh (if we have that many)
shufflePeers(plst[gs.params.Dscore:])

This looks like it could be related to the bootstrap nodes, since there seems to have been connection failures

Investigate wether:

  • this is a bug related to the integration test setup
  • a gossipsub bug
  • a test-unrelated misconfiguration of the p2p network, or related to the peer bootstrapping

ezdac avatar Mar 22 '23 15:03 ezdac

469d405262dda02d6cc4896ce803c71279 prevents the panic. not 100% sure this is a proper fix, so feel free to have a look at it.

schmir avatar Apr 04 '23 08:04 schmir

This seems to be fixed.

jannikluhn avatar May 29 '24 07:05 jannikluhn