celo-blockchain icon indicating copy to clipboard operation
celo-blockchain copied to clipboard

TestSendCelo fails due to missing signer

Open piersy opened this issue 2 years ago • 4 comments

Description

Occurred on master at fb462b6a037d8ffdf20171afbdbc0df41c4ada5d in this build

=== RUN   TestSendCelo
Checking getExchangeSpenders. spenders = []
Checking medianRate. numerator = 1000000000000000000000000  denominator = 1000000000000000000000000 
Checking gas price minimum. cusdValue = 100000000
    e2e_test.go:54: 
        	Error Trace:	e2e_test.go:54
        	Error:      	Received unexpected error:
        	            	failed to build node for network: signer missing: unknown account
        	Test:       	TestSendCelo
--- FAIL: TestSendCelo (1.82s)

I've not seen this before, and it doesn't look like a timeout issue since the test was running for only 1.82s

It failed at network start, here: https://github.com/celo-org/celo-blockchain/blob/fb462b6a037d8ffdf20171afbdbc0df41c4ada5d/e2e_test/e2e_test.go#L54

piersy avatar Oct 07 '22 09:10 piersy

The test usually succeeds, but according to CircleCI, it is flaky with the most recent failure 7 days ago.

If we can trust CircleCI's flakiness detection, we have a high level of flakiness with 76 tests being considered flaky. Maybe we have underlying issue that is causing many tests to become flaky?

karlb avatar Mar 07 '23 12:03 karlb

@karlb That failure you linked to is a timeout, so I think caused by something different to this failure.

My suspicion is that the timeouts are caused by the announce protocol. When a node starts validating it chucks all its validator peers see Backend.RefreshValPeers there is a race condition between that and the use of Network.GossipEnodeCertificatge where I think sometimes nodes refresh their validator peers after enode certificates have been gossiped, there's then a 5 minute delay till the announce protocol gossips the enode certificates again. So I think the timeout problem could be solved by making the timeout for this test about 7 minutes. A better solution would be to rework the announce protocol.

piersy avatar Mar 07 '23 14:03 piersy

@karlb I've not seen this ticket's failure ever again, so it could be that some code change has inadvertently solved it.

piersy avatar Mar 07 '23 14:03 piersy

I was able to reproduce the same error locally (after running the test many times):

--- FAIL: TestSendCelo (0.56s)
    e2e_test.go:42: 
        	Error Trace:	e2e_test.go:42
        	Error:      	Received unexpected error:
        	            	failed to build node for network: signer missing: unknown account
        	Test:       	TestSendCelo

So the issue still exists.

karlb avatar Mar 13 '23 08:03 karlb