go-textile icon indicating copy to clipboard operation
go-textile copied to clipboard

Flaky dials

Open sanderpick opened this issue 5 years ago • 0 comments

Describe the bug Dialing a remote peer often fails at relatively steady intervals. The remote peer stops accepting connections for ~1-2 min, after which everything starts working again. At least I can confirm that the remote is actually not responding.

During this downtime,

curl -v telnet://104.210.43.77:4001

will hang. Once it starts responding again (with /multistream/1.0.0), our service layer calls succeed.

To Reproduce Run a desktop peer in debug mode and register a remote cafe. After awhile, you should notice that dials start to fail:

16:31:04.149 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:31:04.149  INFO tex-servic: error writing message, trying again:  stream reset sender.go:165
16:31:21.089 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWRWXWWWuUKP6u8Fdq9FvyHKz6ksWys5cPoGVwhMMGUp1f main.go:502
16:31:34.151 ERROR   tex-core: error checking messages: context deadline exceeded main.go:713
16:31:34.151 DEBUG   tex-core: flushing downloads block_downloads.go:53
16:31:34.151 DEBUG   tex-core: handling 0 downloads block_downloads.go:61
16:31:34.582 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWGN8VAsPHsHeJtoTbbzsGjs2LTmQZ6wFKvuPich1TYmYY main.go:502
16:32:04.150 DEBUG   tex-core: flushing cafe outbox cafe_outbox.go:172
16:32:04.151 DEBUG   tex-core: handling 0 cafe requests cafe_service.go:1454
16:32:04.152 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:32:09.156 ERROR   tex-core: error checking messages: failed to dial : all dials failed
  * [/ip4/127.0.0.1/tcp/8081/ws] dial tcp 127.0.0.1:8081: connect: connection refused
  * [/ip4/127.0.0.1/tcp/4001] dial tcp4 127.0.0.1:4001: connect: connection refused
  * [/ip4/104.210.43.77/tcp/4001] dial tcp4 0.0.0.0:9113->104.210.43.77:4001: i/o timeout main.go:713
16:32:09.156 DEBUG   tex-core: flushing downloads block_downloads.go:53
16:32:09.156 DEBUG   tex-core: handling 0 downloads block_downloads.go:61
16:32:21.117 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWRWXWWWuUKP6u8Fdq9FvyHKz6ksWys5cPoGVwhMMGUp1f main.go:502
16:32:32.606 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWLnUv9MWuRM6uHirRPBM4NwRj54n4gNNnBtiFiwPiv3Up main.go:502
16:32:48.116 DEBUG tex-servic: received pubsub CAFE_PUBSUB_QUERY from 12D3KooWNjRYvJza7dFx4YTf9kUYBUcbefBDexhBJ3KYHGrC1dtG main.go:502
16:33:04.152 DEBUG   tex-core: flushing cafe outbox cafe_outbox.go:172
16:33:04.152 DEBUG   tex-core: handling 0 cafe requests cafe_service.go:1454
16:33:04.154 DEBUG tex-servic: sending CAFE_CHECK_MESSAGES to 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:102
16:33:04.704 DEBUG tex-servic: received CAFE_MESSAGES response from 12D3KooWSsM117bNw6yu1auMfNqeu59578Bct5V4S9fWxavogrsw main.go:130

Expected behavior All dials should succeed.

sanderpick avatar Jul 23 '19 23:07 sanderpick