goresilience
goresilience copied to clipboard
[race bug-fix] - fix reader-writer race check in metrics middleware
description
usually, we create a runnerChain once in a service and execute the same runnerChain by providing different execution logics. they mostly runs parallely which can leads to race condition in metrics middleware.
current code: (with race condition, detected in go -race also)
the metrics/runner.go -> newMiddleware() function executes parallely.
in this, the two lines, given below, sharing the next variable in all parallel calls that can leads to classic reader-writer problem over next variable resource causing go -race check fails.
next = goresilience.SanitizeRunner(next)
-> write at next
err = next.Run(ctx, f)
-> read at next
solution approach
I realise that we don't need to read and write over next variable in every metrics layer's runner call. instead, we can write once at next during assignment of next = goresilience.SanitizeRunner(next)
variable outside the returned runnerFunc and read line at next variable can be there as it is. so, there will be only read operations exists during parallel execution. hence, no need to sync the read and write over next variable.
to verify race condition
- copy below script
- download dependencies
- run script with -race flag i.e. go run -race script.go
package main
import (
"context"
"fmt"
"sync"
"time"
"github.com/slok/goresilience"
"github.com/slok/goresilience/circuitbreaker"
"github.com/slok/goresilience/metrics"
"github.com/slok/goresilience/retry"
"github.com/slok/goresilience/timeout"
)
const (
myRunner = "MY_RUNNER"
)
// race check of metrics middleware of slok/goresilience package
func initRunner() (*goresilience.Runner, error) {
var runner goresilience.Runner = goresilience.RunnerChain(
metrics.NewMiddleware(myRunner, nil),
circuitbreaker.NewMiddleware(circuitbreaker.Config{}),
retry.NewMiddleware(retry.Config{}),
timeout.NewMiddleware(timeout.Config{}),
)
return &runner, nil
}
func main() {
runner, err := initRunner()
if err != nil {
panic(err)
}
iterations := 1000
var wg sync.WaitGroup
wg.Add(iterations)
for i := 0; i < iterations; i++ {
go func() {
err := (*runner).Run(context.Background(), func(_ context.Context) error {
defer wg.Done()
time.Sleep(200 * time.Millisecond) //to simulate network i/o
return nil
})
if err != nil {
fmt.Println(err)
}
}()
}
wg.Wait()
fmt.Println("done")
}```