go-libp2p
go-libp2p copied to clipboard
swarm: flaky TestDialSimultaneousJoin
=== RUN TestDialSimultaneousJoin
dial_test.go:578: third dial succedded; conn: <swarm.Conn[*tcp.TcpTransport] /ip4/127.0.0.1/tcp/50014 (12D3KooWRuYVGEsecrJJhZsSoKf1UNdBVYKFCmFLNj9ucZiSQCYj) <-> /ip4/127.0.0.1/tcp/50015 (12D3KooWGEcD5sW5osB6LajkHGqiGc3W8eKfYwnJVVqfujkpLWX2)>
dial_test.go:560: second dial succedded; conn: <swarm.Conn[*tcp.TcpTransport] /ip4/127.0.0.1/tcp/50014 (12D3KooWRuYVGEsecrJJhZsSoKf1UNdBVYKFCmFLNj9ucZiSQCYj) <-> /ip4/127.0.0.1/tcp/50015 (12D3KooWGEcD5sW5osB6LajkHGqiGc3W8eKfYwnJVVqfujkpLWX2)>
dial_test.go:[588](https://github.com/libp2p/go-libp2p/runs/6129949662?check_suite_focus=true#step:7:588):
Error Trace: dial_test.go:588
Error: Received unexpected error:
failed to dial 12D3KooWGEcD5sW5osB6LajkHGqiGc3W8eKfYwnJVVqfujkpLWX2:
* [/ip4/127.0.0.1/tcp/50016] failed to negotiate security protocol: context deadline exceeded
Test: TestDialSimultaneousJoin
--- FAIL: TestDialSimultaneousJoin (0.26s)
Assigning myself.
@vyzo Looking at the code related to TestDialSimultaneousJoin, is it correct that the line we're trying to trigger is:
https://github.com/libp2p/go-libp2p/blob/5eaa48fbab3bf4c669f747437ace19d0311b4c8e/p2p/net/swarm/dial_worker.go#L256-L258
I don't recall targeting a specific line, just making sure we have a test for joined dials.
@vyzo Ok, could you point me to 'joined dials' in the code to better understand what are we trying to test, please?
And particularly how are we enforcing (or approaching) the "simultaneous" part of the test.
It's the invariant that two concurrent dials to the same addresses are joined.
Ok, but how do you define concurrent in practice?
What I'm seeing here is the first dial timeouting before the second one has a chance to hit and I'm trying to figure out how to better guarantee that simultaneity.
It's the invariant that two concurrent dials to the same addresses are joined.
This extends to 'same peer' also right? (This might be implicit in what you just stated, just double checking because I'm new in libp2p.)
This extends to 'same peer' also right? (This might be implicit in what you just stated, just double checking because I'm new in libp2p.)
yes, of course -- the dials are peer specific.
What I'm seeing here is the first dial timeouting before the second one has a chance to hit and I'm trying to figure out how to better guarantee that simultaneity.
Uhm, maybe somehow delay the first dial until the second one happens (with a channel probably). Might need to add some test scaffolding in the code.