opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
[processor/remotetap] TestConsume test failure
Component(s)
processor/remotetap
Describe the issue you're reporting
Failure output:
Running target 'test' in module 'processor/remotetapprocessor' as part of group 'processor-1'
make --no-print-directory -C processor/remotetapprocessor test
/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 600s -parallel 4 --tags=""
✓ internal/metadata (1.085s)
✖ . (636ms)
✓ internal/metadata (1.069s)
DONE 31 tests, 6 failures in 35.164s
✖ . (151ms)
✖ . (123ms)
✖ . (139ms)
make[2]: *** [../../Makefile.Common:126: test] Error 1
=== Failed
make[1]: *** [Makefile:176: processor/remotetapprocessor] Error 2
=== FAIL: . TestConsumeMetrics/limit_50 (0.03s)
make[1]: *** Waiting for unfinished jobs....
processor_test.go:62:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
Error: Not equal:
expected: 51
actual : 50
Test: TestConsumeMetrics/limit_50
=== FAIL: . TestConsumeMetrics (0.04s)
=== FAIL: . TestConsumeLogs/limit_50 (0.04s)
processor_test.go:111: 51
processor_test.go:112:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
Error: Not equal:
expected: 51
actual : 50
Test: TestConsumeLogs/limit_50
=== FAIL: . TestConsumeLogs (0.05s)
=== FAIL: . TestConsumeTraces/limit_50 (0.07s)
processor_test.go:160:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
Error: Not equal:
expected: 53
actual : 50
Test: TestConsumeTraces/limit_50
=== FAIL: . TestConsumeTraces (0.08s)
=== FAIL: . TestConsumeMetrics/limit_50 (re-run 1) (0.03s)
processor_test.go:62:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
Error: Not equal:
expected: 51
actual : 50
Test: TestConsumeMetrics/limit_50
=== FAIL: . TestConsumeMetrics (re-run 1) (0.03s)
=== FAIL: . TestConsumeLogs/limit_50 (re-run 1) (0.04s)
processor_test.go:111: 51
processor_test.go:112:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
Error: Not equal:
expected: 51
actual : 50
Test: TestConsumeLogs/limit_50
=== FAIL: . TestConsumeLogs (re-run 1) (0.04s)
=== FAIL: . TestConsumeTraces/limit_50 (re-run 1) (0.05s)
processor_test.go:160:
Error Trace: /home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
Error: Not equal:
expected: 52
actual : 50
Test: TestConsumeTraces/limit_50
=== FAIL: . TestConsumeTraces (re-run 1) (0.05s)
DONE 2 runs, 37 tests, 12 failures in 47.463s
✓ . (58.398s)
DONE 64 tests in 212.584s
make[1]: Leaving directory '/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
make: *** [Makefile:1[24](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9020470488/job/24785698419#step:7:25): gotest] Error 2
Error: Process completed with exit code 2.
Pinging code owners:
- processor/remotetap: @atoulme
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Note that the failure was on arm, not sure if it's related or not.
+2 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019486388/job/24782474775#step:7:106 https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019411342/job/24782226693#step:7:109
Looks like this is a result of https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32481
Looks like for the bigger limit there may be some timing issue going on with arm. The limit is based on events per second, so theoretically if the tests run longer than a second the test is invalid as it would be a new interval for the given limit. I'm seeing all tests take about 0.1 of a second though.
The limiter we're using may be checking the second itself from time.Now for interval enforcement though, which would lead to potential flakiness going across second boundaries, even though the test is running for less that a second. When I added a time.Sleep call to the test to make it take longer on darwin/amd64 I'm getting the same error. One thing I don't understand though is why it would be barely over the limit every time, instead of much higher if it's splitting the second boundaries.
Either way, I believe this is just a flaky test at this point, the limit is getting enforced "close" to its set value.
Increasing the limit can make this functionality much more obvious: Limit: 1,000 -> received 1,029 Limit: 100,000 -> received 200,000 (the total number of metrics being sent. The test took 4.5 seconds, so this was well within the limit for each second.)
Seeing these failures here https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9022795143/job/24793402106?pr=32948