opentelemetry-collector-contrib icon indicating copy to clipboard operation
opentelemetry-collector-contrib copied to clipboard

[processor/remotetap] TestConsume test failure

Open crobert-1 opened this issue 1 year ago • 7 comments

Component(s)

processor/remotetap

Describe the issue you're reporting

Failing CI/CD link

Failure output:

Running target 'test' in module 'processor/remotetapprocessor' as part of group 'processor-1'
make --no-print-directory -C processor/remotetapprocessor test
/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/.tools/gotestsum --rerun-fails=1 --packages="./..." -- -race -timeout 600s -parallel 4 --tags=""
✓  internal/metadata (1.085s)
✖  . (636ms)
✓  internal/metadata (1.069s)

DONE 31 tests, 6 failures in 35.164s

✖  . (151ms)
✖  . (123ms)
✖  . (139ms)

make[2]: *** [../../Makefile.Common:126: test] Error 1
=== Failed
make[1]: *** [Makefile:176: processor/remotetapprocessor] Error 2
=== FAIL: . TestConsumeMetrics/limit_50 (0.03s)
make[1]: *** Waiting for unfinished jobs....
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (0.04s)

=== FAIL: . TestConsumeLogs/limit_50 (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (0.05s)

=== FAIL: . TestConsumeTraces/limit_50 (0.07s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 53
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (0.08s)

=== FAIL: . TestConsumeMetrics/limit_50 (re-run 1) (0.03s)
    processor_test.go:62: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:62
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeMetrics/limit_50

=== FAIL: . TestConsumeMetrics (re-run 1) (0.03s)

=== FAIL: . TestConsumeLogs/limit_50 (re-run 1) (0.04s)
    processor_test.go:111: 51
    processor_test.go:112: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:112
        	Error:      	Not equal: 
        	            	expected: 51
        	            	actual  : 50
        	Test:       	TestConsumeLogs/limit_50

=== FAIL: . TestConsumeLogs (re-run 1) (0.04s)

=== FAIL: . TestConsumeTraces/limit_50 (re-run 1) (0.05s)
    processor_test.go:160: 
        	Error Trace:	/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/processor/remotetapprocessor/processor_test.go:160
        	Error:      	Not equal: 
        	            	expected: 52
        	            	actual  : 50
        	Test:       	TestConsumeTraces/limit_50

=== FAIL: . TestConsumeTraces (re-run 1) (0.05s)

DONE 2 runs, 37 tests, 12 failures in 47.463s
✓  . (58.398s)

DONE 64 tests in 212.584s
make[1]: Leaving directory '/home/runner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib'
make: *** [Makefile:1[24](https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9020470488/job/24785698419#step:7:25): gotest] Error 2
Error: Process completed with exit code 2.

crobert-1 avatar May 09 '24 16:05 crobert-1

Pinging code owners:

  • processor/remotetap: @atoulme

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar May 09 '24 16:05 github-actions[bot]

Note that the failure was on arm, not sure if it's related or not.

crobert-1 avatar May 09 '24 16:05 crobert-1

+2 freq: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019486388/job/24782474775#step:7:106 https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9019411342/job/24782226693#step:7:109

crobert-1 avatar May 09 '24 16:05 crobert-1

Looks like this is a result of https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32481

crobert-1 avatar May 09 '24 17:05 crobert-1

Looks like for the bigger limit there may be some timing issue going on with arm. The limit is based on events per second, so theoretically if the tests run longer than a second the test is invalid as it would be a new interval for the given limit. I'm seeing all tests take about 0.1 of a second though.

The limiter we're using may be checking the second itself from time.Now for interval enforcement though, which would lead to potential flakiness going across second boundaries, even though the test is running for less that a second. When I added a time.Sleep call to the test to make it take longer on darwin/amd64 I'm getting the same error. One thing I don't understand though is why it would be barely over the limit every time, instead of much higher if it's splitting the second boundaries.

Either way, I believe this is just a flaky test at this point, the limit is getting enforced "close" to its set value.

crobert-1 avatar May 09 '24 19:05 crobert-1

Increasing the limit can make this functionality much more obvious: Limit: 1,000 -> received 1,029 Limit: 100,000 -> received 200,000 (the total number of metrics being sent. The test took 4.5 seconds, so this was well within the limit for each second.)

crobert-1 avatar May 09 '24 19:05 crobert-1

Seeing these failures here https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9022795143/job/24793402106?pr=32948

asreehari-splunk avatar May 09 '24 21:05 asreehari-splunk