cmd/scheduler unit tests are flaky
Which jobs are flaking:
cmd/scheduler/app
Which test(s) are flaking:
TestServeHealthzAndMetrics
Reason for failure:
E0611 19:08:35.345640 77644 scheduler.go:284] Failed to serve healthz on 127.0.0.1:8082: listen tcp 127.0.0.1:8082: bind: address already in use
E0611 19:08:35.345723 77644 scheduler.go:284] Failed to serve metrics on 127.0.0.1:8083: listen tcp 127.0.0.1:8083: bind: address already in use
Anything else we need to know:
As the unit tests of both the cmd/scheduler and cmd/descheduler are using port 8082 and 8083, they seem to be affecting each other in some scenarios.
Scheduler: https://github.com/karmada-io/karmada/blob/bd5692f08a259c1a959dc3fd1944071be9fda729/cmd/scheduler/app/scheduler_test.go#L69-L70 Descheduler: https://github.com/karmada-io/karmada/blob/bd5692f08a259c1a959dc3fd1944071be9fda729/cmd/descheduler/app/descheduler_test.go#L69-L70
In fact, after I changed either of them to be 8084 and 8085, unit tests are passing in my environment. I don't have a certain way to reproduce this issue as the unit tests are currently passing on my laptop but failing on my desktop unless I change the ports used in those unit tests.
In order to make unit tests more robust, I suggest using different ports in scheduler and descheduler unit tests. I'm happy to make this simple change if it looks good to you.
Thank you!
Hi @zclyne, thanks for your response.
I tried to reproduce it locally, but I couldn't. From your log, it seems that these two test cases are being executed in parallel. I want to know what command you use to run the test and what version of Go you are using.
Thanks @XiShanYongYe-Chang for getting back. I agree that the problem is probably specific to my desktop environment. I was running make test in my Ubuntu WSL2 and I've tried go version 1.23.8 (the toolchain version) and 1.24.2. Neither of them worked. However, on my laptop which runs go 1.24.4 the tests can pass normally. I have also run git status to make sure that my local codebase is exactly the same as the remote one.
Please feel free to let me know if you need any other information from me. Thank you!
Thanks @zclyne ~
My local Go version is 1.23.8, which seems unrelated to the Go version. Our tests currently do not support parallel execution. Why does it fail to run in WSL2? It's strange.
I think we can set up different ports for these two tests, would you like to help contribute?
Also, regarding the reason for the failure of this issue, if you have further progress, feel free to sync it to the issue, as other team members might encounter it as well.
Thanks again @_@
Thank you very much @XiShanYongYe-Chang . I'm happy to make a PR to change the port. Will keep the community posted if I've figured out the reason for the failure.