zipkin-finagle icon indicating copy to clipboard operation
zipkin-finagle copied to clipboard

Travis build failed in KafkaZipkinTracerTest.whenKafkaIsDown

Open andystanton opened this issue 4 years ago • 2 comments

Issue Description

The Travis build for #80 failed on the test KafkaZipkinTracerTest.whenKafkaIsDown: https://travis-ci.org/github/openzipkin/zipkin-finagle/builds/659274432.

However, the PR did not change any Kafka code or dependencies and the build passed when it was run a second time, so it's possible this is an unreliable test. The error message is:

[INFO] Running zipkin2.finagle.kafka.KafkaZipkinTracerTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.007 s - in zipkin2.finagle.kafka.KafkaZipkinTracerTest
[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   KafkaZipkinTracerIntegrationTest.whenKafkaIsDown:98 
Expecting:
  <{WrappedArray(span_bytes)=185L, List(messages_dropped, java.lang.IllegalStateException)=1L, WrappedArray(spans)=1L, WrappedArray(spans_dropped)=1L, WrappedArray(message_bytes)=187L, WrappedArray(messages)=1L, List(messages_dropped)=2L, List(messages_dropped, org.apache.kafka.common.errors.TimeoutException)=1L}>
to contain only:
  <[MapEntry[key=Buffer(spans), value=1L],
    MapEntry[key=Buffer(span_bytes), value=185L],
    MapEntry[key=Buffer(spans_dropped), value=1L],
    MapEntry[key=Buffer(messages), value=1L],
    MapEntry[key=Buffer(message_bytes), value=187L],
    MapEntry[key=Buffer(messages_dropped), value=1L],
    MapEntry[key=Buffer(messages_dropped, org.apache.kafka.common.errors.TimeoutException), value=1L]]>
elements not found:
  <[MapEntry[key=Buffer(messages_dropped), value=1L]]>
and elements not expected:
  <[MapEntry[key=List(messages_dropped, java.lang.IllegalStateException), value=1L],
    MapEntry[key=List(messages_dropped), value=2L]]>

The kafka module failed to build before the http module started, so it's unlikely that some state teardown was missed in the new code:

[INFO] ------------------------------------------------------------------------
2653[INFO] Reactor Summary for Zipkin Finagle (Parent) 2.1.6-SNAPSHOT:
2654[INFO] 
2655[INFO] Zipkin Finagle (Parent) ............................ SUCCESS [  4.159 s]
2656[INFO] Zipkin Finagle: Core ............................... SUCCESS [  8.622 s]
2657[INFO] Zipkin Finagle: Kafka .............................. FAILURE [ 32.253 s]
2658[INFO] Zipkin Finagle: Http ............................... SKIPPED
2659[INFO] Zipkin Finagle: Scribe ............................. SKIPPED

Steps to recreate

The steps to recreate the issue are to run the build via Maven as per the Travis configuration:

./mvnw install -DskipTests=true -Dmaven.javadoc.skip=true -B -V -Dlicense.skip=true

However, I ran this on a loop locally and did not see the same issue after tens of runs.

andystanton avatar Mar 11 '20 17:03 andystanton

there's some flake risk in the test... https://github.com/openzipkin/zipkin-finagle/blob/master/kafka/src/test/java/zipkin2/finagle/kafka/KafkaZipkinTracerIntegrationTest.java#L96

but the double-drop count is interesting.. I think failsafe isn't setup correct. Probably we should rename things *IntegrationTest to IT* and fix the parent pom to include

Right now, the failsafe config seems to be there, but unused! Mind giving a go?

        <executions>
          <execution>
            <id>integration-test</id>
            <phase>verify</phase>
            <goals>
              <goal>integration-test</goal>
              <goal>verify</goal>
            </goals>
          </execution>
        </executions>

codefromthecrypt avatar Mar 12 '20 00:03 codefromthecrypt

So two parts to your request - one is sorting out Failsafe and the other is getting to the root cause of the flakey test. I'm happy to take a look at both.

Do you think this blocks the release of #80 in the meantime, given the functionality is unrelated?

andystanton avatar May 14 '20 08:05 andystanton