go-metrics icon indicating copy to clipboard operation
go-metrics copied to clipboard

TestDisplayMetrics flaky

Open elboulangero opened this issue 2 years ago • 2 comments

Hi!

During automated builds of go-metrics 0.3.4 by the Debian CI infrastructure, we noticed a test failure on one of our armhf builder (arm high-float). The issue disappeared after a retry.

All the logs are at https://ci.debian.net/packages/g/golang-github-armon-go-metrics/testing/armhf/, and the two logs of interest here are:

  • failure at 2021-09-11 21:15:02 UTC: https://ci.debian.net/data/autopkgtest/testing/armhf/g/golang-github-armon-go-metrics/15197663/log.gz
  • success at 2021-09-12 14:55:43 UTC: https://ci.debian.net/data/autopkgtest/testing/armhf/g/golang-github-armon-go-metrics/15198902/log.gz

The lines of interest in the logs:

=== RUN   TestDisplayMetrics
    inmem_endpoint_test.go:29: bad: [0xd28240 0xd28400]
--- FAIL: TestDisplayMetrics (0.00s)

Looking deeper in the various logs, I noticed that this issue appeared also on the x86_64 architecture:

  • 2021-07-25 08:26:55 UTC: https://ci.debian.net/data/autopkgtest/testing/amd64/g/golang-github-armon-go-metrics/13923270/log.gz

elboulangero avatar Sep 12 '21 17:09 elboulangero

Thank you for the bug report!

Looking at this test, I believe what's happening is that we use https://pkg.go.dev/time#Time.Truncate to split time into intervals. If some of the test data is added before the interval boundary, and some of it is added after the boundary, we'll end up with two intervals for this test, and the test expects only one.

One way to fix this might be to increase the interval in that test slightly from 10ms to 20ms (to give us more buffer). Then when we get the first interval, wait for it to finish (the done chan was added recently in #125), and add all the metrics immediately after it finishes:

interval := 20 * time.Millisecond
inm := NewInmemSink(interval, 50*time.Millisecond)
<- inm.getInterval().done
// add all the sample metrics

data := inm.Data()
// now we expect 2 intervals in data instead of one.
...

As long as all the metrics can be added in 20ms and we call DisplayMetrics before the next interval, the test should not longer flake.

Edit: Ah but done is only closed when something calls getInterval, so we'll also need a goroutine to call that constantly to get the signal. Maybe using a time.After to wait on the interval boundary would be easier than using the done channel.

There are probably other options for fixing it as well.

dnephin avatar Sep 12 '21 18:09 dnephin

Edit: Ah but done is only closed when something calls getInterval, so we'll also need a goroutine to call that constantly to get the signal. Maybe using a time.After to wait on the interval boundary would be easier than using the done channel.

I can confirm that. I tried the change you suggested above, and the test gets stuck, and finally times out after 10 minutes.

Logs for reference:

=== RUN   TestDisplayMetrics
panic: test timed out after 10m0s

goroutine 7 [running]:
testing.(*M).startAlarm.func1()
	/usr/lib/go-1.16/src/testing/testing.go:1700 +0xe5
created by time.goFunc
	/usr/lib/go-1.16/src/time/sleep.go:180 +0x45

goroutine 1 [chan receive]:
testing.(*T).Run(0xc000001b00, 0x66fce6, 0x12, 0x6809f8, 0x48f0a6)
	/usr/lib/go-1.16/src/testing/testing.go:1239 +0x2da
testing.runTests.func1(0xc000001980)
	/usr/lib/go-1.16/src/testing/testing.go:1511 +0x78
testing.tRunner(0xc000001980, 0xc00010dde0)
	/usr/lib/go-1.16/src/testing/testing.go:1193 +0xef
testing.runTests(0xc00000e0d8, 0x7f0040, 0x2a, 0x2a, 0xc047e14ce3db0d3d, 0x8bb2e16fb0, 0x7f54c0, 0x66e426)
	/usr/lib/go-1.16/src/testing/testing.go:1509 +0x2fe
testing.(*M).Run(0xc000154000, 0x0)
	/usr/lib/go-1.16/src/testing/testing.go:1417 +0x1eb
main.main()
	_testmain.go:127 +0x138

goroutine 6 [chan receive]:
github.com/armon/go-metrics.TestDisplayMetrics(0xc000001b00)
	/<<PKGBUILDDIR>>/obj-x86_64-linux-gnu/src/github.com/armon/go-metrics/inmem_endpoint_test.go:18 +0xe5
testing.tRunner(0xc000001b00, 0x6809f8)
	/usr/lib/go-1.16/src/testing/testing.go:1193 +0xef
created by testing.(*T).Run
	/usr/lib/go-1.16/src/testing/testing.go:1238 +0x2b3

elboulangero avatar Sep 13 '21 08:09 elboulangero