tez icon indicating copy to clipboard operation
tez copied to clipboard

TEZ-4569: SCATTER_GATHER + BROADCAST hangs on DAG Recovery

Open okumin opened this issue 1 year ago • 2 comments

Let an AM correctly restore its state and restart tasks. https://issues.apache.org/jira/browse/TEZ-4569

okumin avatar Jun 12 '24 08:06 okumin

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 27m 8s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 :ok: mvndep 6m 32s Maven dependency ordering for branch
+1 :green_heart: mvninstall 12m 39s master passed
+1 :green_heart: compile 1m 17s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: compile 1m 12s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: checkstyle 1m 30s master passed
+1 :green_heart: javadoc 1m 0s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javadoc 0m 45s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+0 :ok: spotbugs 0m 50s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 2m 39s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 49s the patch passed
+1 :green_heart: compile 0m 53s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javac 0m 53s the patch passed
+1 :green_heart: compile 0m 45s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: javac 0m 45s the patch passed
-0 :warning: checkstyle 0m 11s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: findbugs 1m 56s the patch passed
_ Other Tests _
+1 :green_heart: unit 5m 8s tez-dag in the patch passed.
+1 :green_heart: unit 42m 27s tez-tests in the patch passed.
-1 :x: asflicense 0m 30s The patch generated 1 ASF License warnings.
109m 57s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/361
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 4a42ce80b08c 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / e08d0279c
Default Java Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/1/artifact/out/diff-checkstyle-tez-tests.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/1/testReport/
asflicense https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/1/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 1233 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jun 12 '24 10:06 tez-yetus

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 :ok: mvndep 7m 2s Maven dependency ordering for branch
+1 :green_heart: mvninstall 16m 6s master passed
+1 :green_heart: compile 1m 21s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: compile 1m 12s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: checkstyle 1m 33s master passed
+1 :green_heart: javadoc 1m 4s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javadoc 0m 44s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+0 :ok: spotbugs 0m 48s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 2m 49s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 48s the patch passed
+1 :green_heart: compile 0m 52s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javac 0m 52s the patch passed
+1 :green_heart: compile 0m 48s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: javac 0m 48s the patch passed
-0 :warning: checkstyle 0m 12s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 20s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 :green_heart: javadoc 0m 20s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 :green_heart: findbugs 1m 59s the patch passed
_ Other Tests _
+1 :green_heart: unit 5m 11s tez-dag in the patch passed.
+1 :green_heart: unit 41m 25s tez-tests in the patch passed.
+1 :green_heart: asflicense 0m 31s The patch does not generate ASF License warnings.
86m 47s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/361
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 1a7a0c602b4f 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / e08d0279c
Default Java Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/2/artifact/out/diff-checkstyle-tez-tests.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/2/testReport/
Max. process+thread count 1162 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/2/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jun 12 '24 16:06 tez-yetus

I rebased this branch and also added two cosmetic changes.

okumin avatar Dec 23 '24 11:12 okumin

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 28m 3s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 :ok: mvndep 2m 58s Maven dependency ordering for branch
+1 :green_heart: mvninstall 10m 56s master passed
+1 :green_heart: compile 1m 24s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: compile 1m 9s master passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: checkstyle 1m 16s master passed
+1 :green_heart: javadoc 0m 51s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javadoc 0m 41s master passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+0 :ok: spotbugs 1m 3s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 2m 57s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 51s the patch passed
+1 :green_heart: compile 0m 55s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javac 0m 55s the patch passed
+1 :green_heart: compile 0m 48s the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: javac 0m 48s the patch passed
-0 :warning: checkstyle 0m 11s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: findbugs 2m 0s the patch passed
_ Other Tests _
-1 :x: unit 5m 2s tez-dag in the patch failed.
+1 :green_heart: unit 43m 19s tez-tests in the patch passed.
+1 :green_heart: asflicense 0m 21s The patch does not generate ASF License warnings.
106m 23s
Reason Tests
Failed junit tests tez.dag.app.dag.impl.TestDAGRecovery
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/361
JIRA Issue TEZ-4569
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 1c2c562e42b5 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 9efa6f14d
Default Java Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/diff-checkstyle-tez-tests.txt
unit https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/artifact/out/patch-unit-tez-dag.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/testReport/
Max. process+thread count 1134 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/3/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Dec 23 '24 13:12 tez-yetus

I'm checking why TestDAGRecovery failed

okumin avatar Dec 23 '24 13:12 okumin

I'm checking why TestDAGRecovery failed

sometimes recovery tests are flaky, I simply restarted the last precommit job

abstractdog avatar Dec 23 '24 14:12 abstractdog

@abstractdog Sorry, my hand-made refactoring included a mistake. I copy-pasted the method names used in the original condition. I appreciate it if you could double-check it. https://github.com/apache/tez/pull/361/commits/bdece70288d20463b131d852511128103f093cdf

okumin avatar Dec 23 '24 14:12 okumin

@abstractdog Sorry, my hand-made refactoring included a mistake. I copy-pasted the method names used in the original condition. I appreciate it if you could double-check it. bdece70

missed it, I saw it's fixed, glad to see that unit tests revealed the problem +1 still holds if tests will pass

abstractdog avatar Dec 23 '24 14:12 abstractdog

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 15s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 :ok: mvndep 2m 48s Maven dependency ordering for branch
+1 :green_heart: mvninstall 13m 59s master passed
+1 :green_heart: compile 1m 19s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: compile 1m 11s master passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: checkstyle 1m 31s master passed
+1 :green_heart: javadoc 0m 57s master passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javadoc 0m 42s master passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+0 :ok: spotbugs 0m 50s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 :green_heart: findbugs 2m 37s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 10s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 48s the patch passed
+1 :green_heart: compile 0m 51s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javac 0m 51s the patch passed
+1 :green_heart: compile 0m 44s the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: javac 0m 44s the patch passed
-0 :warning: checkstyle 0m 12s tez-tests: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04
+1 :green_heart: javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
+1 :green_heart: findbugs 2m 0s the patch passed
_ Other Tests _
+1 :green_heart: unit 5m 15s tez-dag in the patch passed.
+1 :green_heart: unit 46m 32s tez-tests in the patch passed.
+1 :green_heart: asflicense 0m 22s The patch does not generate ASF License warnings.
84m 40s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/361
JIRA Issue TEZ-4569
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 7e9468acddb5 5.15.0-124-generic #134-Ubuntu SMP Fri Sep 27 20:20:17 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 9efa6f14d
Default Java Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.25+9-post-Ubuntu-1ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_432-8u432-ga~us1-0ubuntu2~22.04-ga
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/artifact/out/diff-checkstyle-tez-tests.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/testReport/
Max. process+thread count 1187 (vs. ulimit of 5500)
modules C: tez-dag tez-tests U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-361/5/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Dec 23 '24 15:12 tez-yetus

Thanks!

okumin avatar Dec 24 '24 01:12 okumin