goresilience icon indicating copy to clipboard operation
goresilience copied to clipboard

[race bug-fix] - fix reader-writer race check in metrics middleware

Open harssRajput opened this issue 8 months ago • 0 comments

description

usually, we create a runnerChain once in a service and execute the same runnerChain by providing different execution logics. they mostly runs parallely which can leads to race condition in metrics middleware.

current code: (with race condition, detected in go -race also) the metrics/runner.go -> newMiddleware() function executes parallely. in this, the two lines, given below, sharing the next variable in all parallel calls that can leads to classic reader-writer problem over next variable resource causing go -race check fails. next = goresilience.SanitizeRunner(next) -> write at next err = next.Run(ctx, f) -> read at next

solution approach

I realise that we don't need to read and write over next variable in every metrics layer's runner call. instead, we can write once at next during assignment of next = goresilience.SanitizeRunner(next) variable outside the returned runnerFunc and read line at next variable can be there as it is. so, there will be only read operations exists during parallel execution. hence, no need to sync the read and write over next variable.

to verify race condition

  1. copy below script
  2. download dependencies
  3. run script with -race flag i.e. go run -race script.go
package main

import (
	"context"
	"fmt"
	"sync"
	"time"

	"github.com/slok/goresilience"
	"github.com/slok/goresilience/circuitbreaker"
	"github.com/slok/goresilience/metrics"
	"github.com/slok/goresilience/retry"
	"github.com/slok/goresilience/timeout"
)

const (
	myRunner = "MY_RUNNER"
)

// race check of metrics middleware of slok/goresilience package
func initRunner() (*goresilience.Runner, error) {
	var runner goresilience.Runner = goresilience.RunnerChain(
		metrics.NewMiddleware(myRunner, nil),
		circuitbreaker.NewMiddleware(circuitbreaker.Config{}),
		retry.NewMiddleware(retry.Config{}),
		timeout.NewMiddleware(timeout.Config{}),
	)

	return &runner, nil
}

func main() {
	runner, err := initRunner()
	if err != nil {
		panic(err)
	}

	iterations := 1000
	var wg sync.WaitGroup
	wg.Add(iterations)
	for i := 0; i < iterations; i++ {
		go func() {
			err := (*runner).Run(context.Background(), func(_ context.Context) error {
				defer wg.Done()
				time.Sleep(200 * time.Millisecond) //to simulate network i/o
				return nil
			})
			if err != nil {
				fmt.Println(err)
			}
		}()
	}
	wg.Wait()
	fmt.Println("done")
}```

harssRajput avatar Oct 13 '23 08:10 harssRajput