apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

test: enable race detector

Open kruskall opened this issue 1 year ago • 2 comments

Motivation/summary

Unfortunately we can only enable the race detector for ci and unit tests. system-tests are building the apm-server binary (CGO disabled by default) so we can't enable the race detector as it requires cgo.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

kruskall avatar Aug 02 '24 23:08 kruskall

This pull request does not have a backport label. Could you fix it @kruskall? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

mergify[bot] avatar Aug 02 '24 23:08 mergify[bot]

There's a test that should probably be fixed, seems like a sync problem.

marclop avatar Aug 12 '24 03:08 marclop

This uncovered a race condition in elastic-agent-client lib: https://github.com/elastic/elastic-agent-client/pull/125

PR blocked until that's fixed

kruskall avatar Nov 26 '24 20:11 kruskall

Are we planning to enable this? I was running unit tests on APM server with race detector and found this:

=== RUN   TestWrapServerAPMInstrumentationTimeout
==================
WARNING: DATA RACE
Read at 0x000108655e50 by goroutine 1198:
  github.com/elastic/apm-server/internal/beater/otlp.NewHTTPHandlers()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/otlp/http.go:60 +0x120
  github.com/elastic/apm-server/internal/beater/api.NewMux()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/api/mux.go:120 +0x6c4
  github.com/elastic/apm-server/internal/beater.newServer()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/server.go:173 +0x1e8
  github.com/elastic/apm-server/internal/beater.(*Runner).Run.newBaseRunServer.func15()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/server.go:146 +0x6c
  github.com/elastic/apm-server/internal/beater.(*Runner).Run.func11()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beater.go:497 +0xb4
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /Users/ericyap/.gvm/pkgsets/go1.23.6/global/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x7c

Previous write at 0x000108655e50 by goroutine 1175:
  github.com/elastic/apm-server/internal/beater/otlp.NewHTTPHandlers()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/otlp/http.go:63 +0x294
  github.com/elastic/apm-server/internal/beater/api.NewMux()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/api/mux.go:120 +0x6c4
  github.com/elastic/apm-server/internal/beater.newTracerServer()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/tracing.go:47 +0xd4
  github.com/elastic/apm-server/internal/beater.(*Runner).Run()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beater.go:500 +0x28f0
  github.com/elastic/apm-server/internal/beater/beatertest.(*Server).Start.func1()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beatertest/server.go:145 +0xb8
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /Users/ericyap/.gvm/pkgsets/go1.23.6/global/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x7c

Goroutine 1198 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /Users/ericyap/.gvm/pkgsets/go1.23.6/global/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x10c
  github.com/elastic/apm-server/internal/beater.(*Runner).Run()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beater.go:496 +0x2850
  github.com/elastic/apm-server/internal/beater/beatertest.(*Server).Start.func1()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beatertest/server.go:145 +0xb8
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /Users/ericyap/.gvm/pkgsets/go1.23.6/global/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x7c

Goroutine 1175 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /Users/ericyap/.gvm/pkgsets/go1.23.6/global/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x10c
  github.com/elastic/apm-server/internal/beater/beatertest.(*Server).Start()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beatertest/server.go:143 +0xf0
  github.com/elastic/apm-server/internal/beater/beatertest.NewServer()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/beatertest/server.go:69 +0x50
  github.com/elastic/apm-server/internal/beater_test.TestWrapServerAPMInstrumentationTimeout()
      /Users/ericyap/Repos/elastic/apm/apm-server/internal/beater/server_test.go:570 +0x408
  testing.tRunner()
      /Users/ericyap/.gvm/gos/go1.23.6/src/testing/testing.go:1690 +0x184
  testing.(*T).Run.gowrap1()
      /Users/ericyap/.gvm/gos/go1.23.6/src/testing/testing.go:1743 +0x40
==================
    testing.go:1399: race detected during execution of test
--- FAIL: TestWrapServerAPMInstrumentationTimeout (0.03s)

ericywl avatar Feb 27 '25 07:02 ericywl

Are we planning to enable this?

Yes please! It seems that this PR fell through the cracks while waiting for the blocking PR to be merged. @ericywl do you have capacity to bring this over the finish line? Including the investigation for the currently failing test.

simitt avatar Feb 27 '25 12:02 simitt

@simitt Ok sure, I will take a look at it.

ericywl avatar Feb 28 '25 02:02 ericywl

Created a follow up PR here since I'm unsure of how to commit to @kruskall fork: https://github.com/elastic/apm-server/pull/15932. Closing this for now.

ericywl avatar Feb 28 '25 03:02 ericywl