DataflowTemplates icon indicating copy to clipboard operation
DataflowTemplates copied to clipboard

Find test flakiness when building the project.

Open IntHelloWorld opened this issue 3 years ago • 3 comments

Dear developers, When I built the project I got some failures from these tests:

1. com.google.cloud.teleport.splunk.SplunkIOTest#successfulSplunkIOMultiBatchParallelismTest 2. com.google.cloud.teleport.bigtable.CassandraKeyUtilsTest#testSimplePrimaryKeyOrder 3. com.google.cloud.teleport.splunk.SplunkEventWriterTest#successfulSplunkWriteSingleBatchTest 4. com.google.cloud.teleport.splunk.SplunkEventWriterTest#eventWriterInvalidURL 5. com.google.cloud.teleport.splunk.SplunkEventWriterTest#failedSplunkWriteSingleBatchTest 6. com.google.cloud.teleport.bigtable.CassandraRowMapperFnTest#testTimestampColumn 7. com.google.cloud.teleport.bigtable.CassandraRowMapperFnTest#testType4UUIDColumn 8. com.google.cloud.teleport.bigtable.CassandraRowMapperFnTest#testType1UUIDColumn 9. com.google.cloud.teleport.splunk.HttpEventPublisherTest#unrecognizedSelfSignedCertificateTest

but when I rebuilt the project these failures disappeared, are they flaky tests or is their flakiness is a normal phenomenon? The test reports of these tests are in the attached file: flaky test reports.txt Could you please have a look at this problem? Thanks a lot.

IntHelloWorld avatar Jan 10 '22 14:01 IntHelloWorld

Hi! Thanks for sharing the report file!

I haven't personally hit flakes, but it looks like some internal metric is marking these as 5 - 10% flaky. Based on the file you shared, the two things I'm noticing are:

  1. Errors related to something not happening in X amount of time.
  2. Attempts to bind to already-bound ports.

Depending on what's causing the second, both might be performance related. I'll see if someone can look into this.

zhoufek avatar Jan 12 '22 22:01 zhoufek

A couple of us have tried to reproduce it without almost no success. In my case, the longest-running test was about 5-6s, but the tests with a clear timeout are all at 20s.

For any test using org.mockserver.integration.ClientAndServer (example), we can probably increase the configured timeout, though I think that this should be handled by someone who can reproduce the flakes. That will solve at least some of the issues. We'll likely want a more permanent solution if possible, though.

As a temporary solution, if these are causing issues for development, they can be skipped from the command line using something like:

mvn test -Dtest=\!SplunkIOTest

One issue I was able to kind of reproduce was the HttpEventPublisherTest failure, though it was in a separate test. Someone is looking into a potential cause there.

zhoufek avatar Jan 13 '22 19:01 zhoufek

Thanks for your response! It's really useful.

IntHelloWorld avatar Jan 14 '22 06:01 IntHelloWorld

Thanks for reporting.

I have tried to run the mentioned classes and didn't notice any failure.

I'll assume they are fixed or improved and close this issue for now.

Please reopen if there are any new failures or details.

bvolpato avatar Feb 06 '23 04:02 bvolpato