tez icon indicating copy to clipboard operation
tez copied to clipboard

TEZ-4565. TestAnalyzer subtest testInternalPreemption is flaky

Open jteagles opened this issue 1 year ago • 7 comments

Occasionally, attempt comes back as 000001_2 and doesn't match

jteagles avatar May 16 '24 20:05 jteagles

This is just a test patch to induce the flaky test

jteagles avatar May 16 '24 21:05 jteagles

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 22m 21s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 :green_heart: mvninstall 18m 12s master passed
+1 :green_heart: compile 0m 34s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: compile 0m 33s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: checkstyle 1m 19s master passed
+1 :green_heart: javadoc 0m 37s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javadoc 0m 24s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+0 :ok: spotbugs 1m 18s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 1m 16s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 0m 21s the patch passed
+1 :green_heart: compile 0m 19s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javac 0m 19s the patch passed
+1 :green_heart: compile 0m 17s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: javac 0m 17s the patch passed
+1 :green_heart: checkstyle 0m 10s the patch passed
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javadoc 0m 10s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: findbugs 0m 39s the patch passed
_ Other Tests _
-1 :x: unit 2m 57s job-analyzer in the patch failed.
+1 :green_heart: asflicense 0m 17s The patch does not generate ASF License warnings.
51m 37s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/357
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux e9caf1ceae87 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / a1fcddb8b
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
unit https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/1/artifact/out/patch-unit-tez-tools_analyzers_job-analyzer.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/1/testReport/
Max. process+thread count 913 (vs. ulimit of 5500)
modules C: tez-tools/analyzers/job-analyzer U: tez-tools/analyzers/job-analyzer
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar May 16 '24 21:05 tez-yetus

Second commit is try to fix the flaky test

jteagles avatar May 16 '24 23:05 jteagles

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 23m 11s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 :green_heart: mvninstall 15m 2s master passed
+1 :green_heart: compile 0m 27s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: compile 0m 26s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: checkstyle 1m 12s master passed
+1 :green_heart: javadoc 0m 33s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javadoc 0m 18s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+0 :ok: spotbugs 1m 27s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 1m 26s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 0m 29s the patch passed
+1 :green_heart: compile 0m 20s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javac 0m 20s the patch passed
+1 :green_heart: compile 0m 20s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: javac 0m 20s the patch passed
+1 :green_heart: checkstyle 0m 11s the patch passed
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 8s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 :green_heart: javadoc 0m 10s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 :green_heart: findbugs 1m 12s the patch passed
_ Other Tests _
+1 :green_heart: unit 2m 50s job-analyzer in the patch passed.
+1 :green_heart: asflicense 0m 17s The patch does not generate ASF License warnings.
49m 35s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/357
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 44860714a0db 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / a1fcddb8b
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/2/testReport/
Max. process+thread count 892 (vs. ulimit of 5500)
modules C: tez-tools/analyzers/job-analyzer U: tez-tools/analyzers/job-analyzer
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-357/2/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar May 17 '24 00:05 tez-yetus

@abstractdog , left this as two commits so you can see the first failed and the second succeeds. Will need squash and merge.

jteagles avatar May 17 '24 00:05 jteagles

I'm actually fine with the patch, however, I'm a bit struggling to understand the cause this unit test looks like describing an exact scenario (see comment above testInternalPreemption), however the underlying taskattempt ids look like a mess to me :) I mean, the current state without the patch is:

        createStep("v1 : 00000[01]_0", CriticalPathDependency.INIT_DEPENDENCY),
        createStep("v2 : 00000[01]_0", CriticalPathDependency.DATA_DEPENDENCY),
        createStep("v3 : 000000_0", CriticalPathDependency.DATA_DEPENDENCY, 
            TaskAttemptTerminationCause.INTERNAL_PREEMPTION, null),
        createStep("v2 : 00000[01]_1", CriticalPathDependency.OUTPUT_RECREATE_DEPENDENCY),
        createStep("v1 : 000000_1", CriticalPathDependency.OUTPUT_RECREATE_DEPENDENCY,
            null, Collections.singletonList("preemption of v3")),
        createStep("v2 : 00000[01]_1", CriticalPathDependency.DATA_DEPENDENCY),
        createStep("v3 : 000000_1", CriticalPathDependency.DATA_DEPENDENCY)

this PR modifies the third occurrence of v2 to 00000[01]_[12], so if it's just sometimes 000001_2, and is 000001_1 for the rest, then the second occurrence of v2 is also weird: 00000[01]_1, so it's the same task attempt id (1)?... is it possible? do all of the scenarios make sense and test what we expect?

abstractdog avatar May 17 '24 07:05 abstractdog

@jteagles: unfortunately, TestAnalyzer is still flaky, is there a chance you can respond to my question above and whether we can merge this fix? it would be awesome to clean this test thanks in advance!

abstractdog avatar Dec 19 '24 06:12 abstractdog