cronos icon indicating copy to clipboard operation
cronos copied to clipboard

Problem: upgrade integration test timeout mysterily in CI

Open yihuang opened this issue 3 years ago • 3 comments

upgrade-debug.zip

The data directory for a failed run is attached.

Run locally: nix-shell integration_tests/shell.nix --run 'pytest -s -vv integration_tests/test_upgrade_gravity.py'

yihuang avatar Aug 18 '22 08:08 yihuang

@yihuang can I know the revision of git? Are you able to reproduce it in your local? I tried tag v0.8.0-gravity-alpha1 and latest main in my local, and both work (never failed).

Is the current CI still sometimes fail on this test?

JayT106 avatar Aug 19 '22 16:08 JayT106

https://github.com/crypto-org-chain/cronos/actions/runs/2873674246 I think it's this one, just occasionally fails.

yihuang avatar Aug 21 '22 07:08 yihuang

From the logs in https://github.com/crypto-org-chain/cronos/files/9371792/upgrade-debug.zip

We see the node0 has the connecting issue to the node1, so I guess the TCP socket connect somehow couldn't completely close in the previous shutdown (using panic) during the node upgrade.

I will look at the tendermint to see the TCP connection states during the panic shutdown. Not sure if it (the socket connect issue) relates to the CI environment.

7:37AM INF service stop impl={"Logger":{}} module=p2p msg={} peer={"id":"96202b106a4910020f6ca570f65f28b3c908c960","ip":"::1","port":26110} server=node
7:37AM ERR Stopping peer for error err="read tcp [::1]:48424->[::1]:26110: read: connection reset by peer" module=p2p peer={"Data":{},"Logger":{}} server=node
7:37AM INF service stop impl={"Data":{},"Logger":{}} module=p2p msg={} peer={"id":"96202b106a4910020f6ca570f65f28b3c908c960","ip":"::1","port":26110} server=node
7:37AM INF Reconnecting to peer addr={"id":"96202b106a4910020f6ca570f65f28b3c908c960","ip":"::1","port":26110} module=p2p server=node
7:37AM INF Dialing peer address={"id":"96202b106a4910020f6ca570f65f28b3c908c960","ip":"::1","port":26110} module=p2p server=node
7:37AM INF Error reconnecting to peer. Trying again addr={"id":"96202b106a4910020f6ca570f65f28b3c908c960","ip":"::1","port":26110} err="dial tcp [::1]:26110: connect: connection refused" module=p2p server=node tries=0

JayT106 avatar Aug 23 '22 20:08 JayT106