ginkgo
ginkgo copied to clipboard
Data race when adding reports from multiple goroutines
What I'm doing now, which may be completely misguided:
- For long-running tests, I want to periodically poll some system information and add it to the spec reports.
- I do this by spawning a goroutine in a
BeforeEach
, and closing it in theAfterEach
func ReportPeriodically(interval time.Duration) { mu := sync.Mutex{} cancelMap := map[string]func(){} ginkgo.BeforeEach(func() { report := ginkgo.CurrentSpecReport() name := report.FullText() ctx, cancel := context.WithCancel(context.Background()) mu.Lock() defer mu.Unlock() cancelMap[name] = cancel ticker := time.NewTicker(interval) go func() { for { // this can run twice since there is no priority selection, but should not matter select { case now := <-ticker.C: reportName := fmt.Sprintf("%s-periodic-%d", name, now.Unix()) ginkgo.AddReportEntry(reportName, "some system info") case <-ctx.Done(): ticker.Stop() return } } }() }) ginkgo.AfterEach(func() { report := ginkgo.CurrentSpecReport() name := report.FullText() mu.Lock() defer mu.Unlock() if cancel, ok := cancelMap[name]; ok { cancel() } delete(cancelMap, name) }) }
- I run the suite sequentially
- Data Race:
WARNING: DATA RACE Read at 0x00c0005824d0 by goroutine 18: github.com/onsi/ginkgo/v2/internal.(*Suite).AddReportEntry() /home/runner/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/internal/suite.go:226 +0x386 github.com/onsi/ginkgo/v2.AddReportEntry() /home/runner/go/pkg/mod/github.com/onsi/ginkgo/[email protected]/reporting_dsl.go:69 +0x275
It seems the suite doesn't lock the entries array for reads/writes: https://github.com/onsi/ginkgo/blob/aae4480d35116455d7115e71120566e8be55b4a4/internal/suite.go#L226
I'm wondering if there's a better way to accomplish this, or if you'd accept a PR that protects access to the ReportEntries
with a sync.RWMutex
?
hey @austince sorry for the delay. This is a cool use case. I believe the lack of a lock is an oversight on my part. If you'd be still be up for a PR that would be great. If not, I'll work on it.
One quick comment - I don't think you need the cancelMap
. Only one spec ever runs at a time in a given Ginkgo process so you should be able to do:
func ReportPeriodically(interval time.Duration) {
ginkgo.BeforeEach(func() {
name := ginkgo.CurrentSpecReport().FullText()
ctx, cancel := context.WithCancel(context.Background())
ticker := time.NewTicker(interval)
go func() {
for {
select {
case now := <-ticker.C:
reportName := fmt.Sprintf("%s-periodic-%d", name, now.Unix())
ginkgo.AddReportEntry(reportName, "some system info")
case <-ctx.Done():
ticker.Stop()
return
}
}
}()
DeferCleanup(cancel)
})
}
That DeferCleanup
will call cancel
for you after the spec completes (or fails)
Ah, DeferCleanup
, genius. Thanks for that. I'll put in a PR for the lock asap :)
I'm actually not sure if I have the knowledge to finish the PR — it seems we need to lock more than just the append to the report entries, though I'm not sure why.
hey thanks so much for taking the time to work on it. i really appreciate it. i'll take a look soon and circle back.
hey @austince - it took me a bit to get my head back in the game but I think there are a couple of things going on here. One is an issue in Ginkgo (that isn't actually at play in this scenario - but is relevant as I'll explain below). The other is a subtle issue in your code. I'll tackle the latter first.
I was able to reproduce the race you're seeing quite easily - thanks for the clear example. At first I went down the road of locking access to the report in the two user-facing functions: CurrentSpecReport
and AddReportEntry
...
...and it didn't help. Then I noticed that the race was between the goroutine launched by ReportPeriodically
calling AddReportEntry
and Ginkgo's default reporter. This confused me... the reporter should be only running after your test has finished cleaning up (i.e. after the AfterEach
or DeferCleanup
runs). This ordering is protected by channels within Ginkgo so it's a strict ordering - the body of your AfterEach
/DeferCleanup
is guaranteed to finish before the report is passed to Ginkgo's reporter.
So why was the race?
And then I realized that the issue is that the call to cancel()
in the AfterEach
/DeferCleanup
exits immediately and doesn't wait till the underlying goroutine finishes up. That means the groutine could still be running when the AfterEach
/DeferCleanup
exits and while the reporter is doing its thing. Moreover, it actually means that the grouting could still be running when the next test starts. Which means you'd end up with a report entry from one test making its way into the reports slice of another test.
So, in this case, the race detector is actually telling us that something is actually going on with the code that needs to be fixed. If you do this instead:
func ReportPeriodically(interval time.Duration) {
ginkgo.BeforeEach(func() {
name := ginkgo.CurrentSpecReport().FullText()
ctx, cancel := context.WithCancel(context.Background())
ticker := time.NewTicker(interval)
done := make(chan interface{})
go func() {
for {
select {
case now := <-ticker.C:
reportName := fmt.Sprintf("%s-periodic-%d", name, now.Unix())
ginkgo.AddReportEntry(reportName, "some system info")
case <-ctx.Done():
ticker.Stop()
close(done)
return
}
}
}()
DeferCleanup(func() {
cancel()
<-done
})
})
}
you should see the race disappear, even when using the same version of Ginkgo.
But. This isn't the only issue at play. I realized that your perfectly valid usecase implies a class of races in Ginkgo that need to be resolved. Consider the following:
var _ = Context("reporting races", func() {
ReportPeriodically(100 * time.Millisecond)
It("shouldn't race... but does", func() {
i := 0
for i < 10 {
ginkgo.AddReportEntry("boom")
time.Sleep(50 * time.Millisecond)
i++
}
})
})
Now, because the current version of Ginkgo don't account for concurrent calls to AddReportEntry
, ginkgo -race
will fail since the goroutine launched by ReportPeriodically
will race with the call to AddReportEntry
in the It
. To fix this I need to add a lock in the implementations of AddReportEntry
and CurrentSpecReport
. Since both of these are the sole user-facing access points that can be hit concurrently, they should be the only places I need a lock.
I'm working on that now and will push it out tomorrow/Monday once I add some meaningful tests to ensure no races occur. But, you should be able to be unblocked with the current released version of Ginkgo by adding that done
channel to ReportPeriodically
.
I've updated master and made AddReportEntry
and CurrentSpecReport
thread safe. I'll be cutting a release soon. Feel free to close this if you think we've resolved this @austince - or let me know if things don't make sense/aren't working.
Ah, the done channel makes sense when you break it down. I've looked at the fix and it makes sense too, once everything else is synchronized. Thanks for digging in and fixing this, will close now!