tez
tez copied to clipboard
TEZ-4440. When tez app run in yarn fed cluster, may throw NPE
https://issues.apache.org/jira/browse/TEZ-4440
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 32m 42s | Docker mode activated. |
| _ Prechecks _ | |||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 15m 0s | master passed |
| +1 :green_heart: | compile | 0m 59s | master passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | compile | 0m 55s | master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | checkstyle | 1m 32s | master passed |
| +1 :green_heart: | javadoc | 1m 3s | master passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | javadoc | 0m 53s | master passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +0 :ok: | spotbugs | 1m 53s | Used deprecated FindBugs config; considering switching to SpotBugs. |
| +1 :green_heart: | findbugs | 1m 51s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 0m 27s | the patch passed |
| +1 :green_heart: | compile | 0m 30s | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | javac | 0m 30s | the patch passed |
| +1 :green_heart: | compile | 0m 27s | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | javac | 0m 27s | the patch passed |
| +1 :green_heart: | checkstyle | 0m 25s | the patch passed |
| +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
| +1 :green_heart: | javadoc | 0m 24s | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | javadoc | 0m 23s | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | findbugs | 1m 10s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 5m 23s | tez-dag in the patch passed. |
| +1 :green_heart: | asflicense | 0m 16s | The patch does not generate ASF License warnings. |
| 65m 20s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-235/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/tez/pull/235 |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
| uname | Linux b1c11f8cf5dd 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 621a83152 |
| Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-235/1/testReport/ |
| Max. process+thread count | 228 (vs. ulimit of 5500) |
| modules | C: tez-dag U: tez-dag |
| Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-235/1/console |
| versions | git=2.25.1 maven=3.6.3 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
thanks for this patch @zhengchenyu!
can you include a unit test to TestTaskScheduler which confirms that a TaskScheduler returns Resource(0,0) even if the RM client returned null?
I'm not familiar with yarn federation, but defaulting to Resource(0,0) makes sense in edge cases can you please clarify if this is specific to yarn federation or can happen without yarn federation too? (it has never been reported yet) why does it return null? does it reflect the state of a specific RM or the whole cluster of RMs?
thanks for this patch @zhengchenyu!
can you include a unit test to TestTaskScheduler which confirms that a TaskScheduler returns Resource(0,0) even if the RM client returned null?
I'm not familiar with yarn federation, but defaulting to Resource(0,0) makes sense in edge cases can you please clarify if this is specific to yarn federation or can happen without yarn federation too? (it has never been reported yet) why does it return null? does it reflect the state of a specific RM or the whole cluster of RMs?
It happen only in yarn federation, will never happen without yarn federation. In fact, YARN-8933 have fix it. After apply YARN-8933, it will never happen in yarn federation. I don't know it is necessary to continue it. Because it is not a problem for latest hadoop version, but still a problem for some popular version (For example: hadoop-3.2.1). If you think it is necessary, I will add some unit test. If you think it is not necessary, I will close it.
For why return null in yarn federation?
It is another issue about yarn. Yarn router use some async thread to connect rm. When all down streaming resourcemanager timeout, yarn router may return null. But After YARN-8933, will return Resource(0,0).
thanks @zhengchenyu, after reading YARN-8933 this definitely makes sense I don't insist on adding a unit test as we're "fixing" a yarn issue here, which is not present anymore after YARN-8933