cluster-api-provider-azure
                                
                                 cluster-api-provider-azure copied to clipboard
                                
                                    cluster-api-provider-azure copied to clipboard
                            
                            
                            
                        Enable race detector for unit tests
What type of PR is this?
/kind failing-test
What this PR does / why we need it:
Enables the Go race detector in unit tests, because that seems like a best practice. However, it does more than double the time:
% git co ill-race-you && time TEST_ARGS="-race -count=1" make go-test
...
TEST_ARGS="-race -count=1" make go-test  518.22s user 31.92s system 447% cpu 2:02.92 total
% git co main && time TEST_ARGS="-count=1" make go-test
...
TEST_ARGS="-count=1" make go-test  144.72s user 20.40s system 343% cpu 48.067 total
Maybe we could leave -race enabled for CI but users can disable it locally via TEST_ARGS? I don't want to make tests take longer, but this flag already found two data races.
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
If your tests have incomplete coverage, you may find more races by running a binary built with -race under a realistic workload.
This sentence in the race detector docs is very intriguing to me. I wonder if we could build a debug binary with -race that we used in CI; I have a strong intuition it would find something.
Or maybe add RACE_DETECTOR as a toggle for Tilt binaries? Edit: see #2727.
TODOs:
- [x] squashed commits
- [ ] includes documentation
- [ ] adds unit tests
Release note:
NONE
/retitle [WIP] Enable race detector for unit tests
This is cool! Which part is still [WIP]?
Which part is still [WIP]?
Just that I'm wary of more than doubling the time for make go-test. Increasing that part of the developer loop or a CI job from 3 to 8 minutes (for example) is painful.
Users can always disable race checking with TEST_ARGS="". If we're ok with that option and with the default for unit tests taking longer, I think we should merge this.
/retitle Enable race detector for unit tests
Just that I'm wary of more than doubling the time for make go-test. Increasing that part of the developer loop or a CI job from 3 to 8 minutes (for example) is painful.
That's not great. What if we add it to the make test target but not make go-test? That way devs can still use make go-test locally to run tests quickly but the PR job will run this with generate and lint?
My vote is a new periodic job
What if we add it to the
make testtarget but notmake go-test?
I like this idea: we should run it in CI but not make developers wait longer. I made those changes by adding a go-test-race target. @jackfrancis wdyt?
@mboersma are we sure this is working? the most test run for this PR was within the expected time duration range of previous runs (withough the -race option)
indeed
make test TEST_ARGS=" -coverprofile=coverage.out"
Looks like test-cover is setting its own args
/lgtm cancel
Oh, I see we actually use make test-cover in CI. We can set the args for that similarly.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from cecilerobertmichon by writing /assign @cecilerobertmichon in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@mboersma: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command | 
|---|---|---|---|---|
| pull-cluster-api-provider-azure-e2e-exp | 17b8fdba9993035733750d76da4dc449d3a7cea1 | link | false | /test pull-cluster-api-provider-azure-e2e-exp | 
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.
IDK why the e2e-exp job ran, but it failed to create an AKS cluster. I'm ignoring that flake since it's not a required check, and this only affects the unit tests.
IDK why the e2e-exp job ran
it runs whenever we change the Makefile and other common CI files
that last run was suspiciously fast and I don't see anything in the output of the test run that indicates it ran the race detector, how do we know it's working?
I added temp commits to log the go test command line and to reintroduce the race conditions. I see the -race flag being passed through and indeed the unit tests failed:
--- FAIL: TestReconcileInboundNATRule (0.00s)
    --- FAIL: TestReconcileInboundNATRule/fail_to_create_NAT_rule (0.00s)
        testing.go:1319: race detected during execution of test
    --- FAIL: TestReconcileInboundNATRule/NAT_rule_successfully_created_with_with_no_existing_rules (0.00s)
        testing.go:1319: race detected during execution of test
    --- FAIL: TestReconcileInboundNATRule/No_LB,_Nat_rule_reconciliation_is_skipped (0.00s)
        testing.go:1319: race detected during execution of test
It does run relatively quickly in CI: 6 or 7 minutes. But I'll take it since it's working as intended.
@CecileRobertMichon @jackfrancis this is ready for review.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: jackfrancis
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [jackfrancis]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment