kiali icon indicating copy to clipboard operation
kiali copied to clipboard

unexpected ci error in TestClientExpiry

Open Xunzhuo opened this issue 2 years ago • 6 comments

See: https://github.com/kiali/kiali/runs/5680434347?check_suite_focus=true

the changes of the PR should not cause this CI failed.

Xunzhuo avatar Mar 25 '22 00:03 Xunzhuo

I've seen this failure before - we never found out why it happens. There might even already be a github issue on this.

jmazzitelli avatar Mar 25 '22 01:03 jmazzitelli

https://github.com/kiali/kiali/pull/2968#pullrequestreview-443706509

jmazzitelli avatar Mar 25 '22 02:03 jmazzitelli

Ops, I think I know what is tricky here: the mutex is a global lock: https://github.com/kiali/kiali/blob/master/kubernetes/client_factory.go#L19 In TestClientExpiry: https://github.com/kiali/kiali/blob/master/kubernetes/client_factory_test.go#L14 After executing getClientFactory, goroutines start to run watchClients and they are rapidly using the global mutexto lock and unlock. https://github.com/kiali/kiali/blob/master/kubernetes/client_factory.go#L181. But in test case, it is also using the mutex global lock, some time, it will lead chaos, like lock failed or conflicts, which will make the delete expired client failed: https://github.com/kiali/kiali/blob/master/kubernetes/client_factory.go#L184 That is why we see:

    client_factory_test.go:57: 
        	Error Trace:	client_factory_test.go:57
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 2
        	Test:       	TestClientExpiry
    client_factory_test.go:59: 
        	Error Trace:	client_factory_test.go:59
        	Error:      	Should be false
        	Test:       	TestClientExpiry

Because the first test expired foo client is not properly delete from clientEntries.

Xunzhuo avatar Mar 25 '22 06:03 Xunzhuo

In an elegant way, we should use channel to tell goroutine to recycle this client, not for sleep and loop to read the cliententries.

Xunzhuo avatar Mar 25 '22 10:03 Xunzhuo

But in test case, it is also using the mutex global lock, some time, it will lead chaos, like lock failed or conflicts, which will make the delete expired client failed

I've commented in the PR, perhaps the strategy for this work could be to improve the test, but moving a global lock to a channel because a test shouldn't be the first option considered.

(Global locks are perfectly correct resources in Golang and other languages to deal with concurrency scenarios).

lucasponce avatar Mar 28 '22 12:03 lucasponce

Hi @lucasponce thanks for commenting, I have replied in the Pr.

Xunzhuo avatar Mar 28 '22 14:03 Xunzhuo

This continues to plague CI and should be added to the backlog.

nrfox avatar Aug 29 '22 14:08 nrfox

You'll hit this much more often locally when running the tests with the -race flag: make test-race.

nrfox avatar Aug 29 '22 14:08 nrfox