tez
tez copied to clipboard
TEZ-4577: SortSpan could be created real small, resulting in eventual job failure
After TEZ-4542, app may run into an issue of real small sortspan (per record in this case), eventually the job failed due to timeout. Here, fix int overflow problem in another way.
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 27m 3s | Docker mode activated. |
| _ Prechecks _ | |||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. |
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 16m 8s | master passed |
| +1 :green_heart: | compile | 0m 32s | master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 |
| +1 :green_heart: | compile | 0m 32s | master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| +1 :green_heart: | checkstyle | 1m 24s | master passed |
| +1 :green_heart: | javadoc | 0m 42s | master passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 |
| +1 :green_heart: | javadoc | 0m 26s | master passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| +0 :ok: | spotbugs | 1m 40s | Used deprecated FindBugs config; considering switching to SpotBugs. |
| +1 :green_heart: | findbugs | 1m 38s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 0m 24s | the patch passed |
| +1 :green_heart: | compile | 0m 25s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 |
| +1 :green_heart: | javac | 0m 25s | the patch passed |
| +1 :green_heart: | compile | 0m 21s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| +1 :green_heart: | javac | 0m 21s | the patch passed |
| +1 :green_heart: | checkstyle | 0m 17s | the patch passed |
| +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
| +1 :green_heart: | javadoc | 0m 18s | the patch passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 |
| +1 :green_heart: | javadoc | 0m 17s | the patch passed with JDK Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| +1 :green_heart: | findbugs | 1m 2s | the patch passed |
| _ Other Tests _ | |||
| +1 :green_heart: | unit | 5m 50s | tez-runtime-library in the patch passed. |
| +1 :green_heart: | asflicense | 0m 16s | The patch does not generate ASF License warnings. |
| 58m 33s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/tez/pull/367 |
| JIRA Issue | TEZ-4577 |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile |
| uname | Linux a8af2c083205 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/tez.sh |
| git revision | master / 174d4e3bb |
| Default Java | Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu322.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~22.04-b05 |
| Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/testReport/ |
| Max. process+thread count | 1100 (vs. ulimit of 5500) |
| modules | C: tez-runtime-library U: tez-runtime-library |
| Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-367/1/console |
| versions | git=2.34.1 maven=3.6.3 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
@abstractdog @yigress @rbalamohan Can you please review this pr?
@abstractdog I see TEZ-4542 have merged to release-0.10.4-rc0. I think this pr should also be merged into release-0.10.4-rc0.
+1 LGTM
@abstractdog Hi, how about review this PR? Since TEZ-4542 may cause performance degradation in some scenarios, we should merge this to fix.
thanks a lot @zhengchenyu for taking care of this I believe this is almost ready to go in, let me ask one more thing so I just confirmed that the unit test added with TEZ-4542 indeed reproduces the issue, which remains solved with reverting TEZ-4542 + applying this(long) cast what is strange is without the patch, the full trace of the IllegalArgumentException is not visible, I can only see:
[ERROR] org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory Time elapsed: 1.584 s <<< ERROR!
java.lang.IllegalArgumentException
at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878)
[INFO]
[INFO] Results:
can you please check if this can be easily solved in the scope of this patch? (or can you see the same on your machine?) thanks in advance!
@abstractdog No need to revert TEZ-4542. This PR can be understood as a solution to the problem described in TEZ-4542 in another way. From another perspective, this PR has actually reverted TEZ-4542.
In my pc, without TEZ-4542 and TEZ-4577, testWithLargeRecordAndLowMemory will fail, the error log are below:
java.lang.IllegalArgumentException: newPosition > limit: (16777216 > 1048576)
at java.nio.Buffer.createPositionException(Buffer.java:269)
at java.nio.Buffer.position(Buffer.java:244)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.<init>(PipelinedSorter.java:952)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:361)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:434)
at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:390)
at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
@abstractdog No need to revert TEZ-4542. This PR can be understood as a solution to the problem described in TEZ-4542 in another way. From another perspective, this PR has actually reverted TEZ-4542.
In my pc, without TEZ-4542 and TEZ-4577,
testWithLargeRecordAndLowMemorywill fail, the error log are below:java.lang.IllegalArgumentException: newPosition > limit: (16777216 > 1048576) at java.nio.Buffer.createPositionException(Buffer.java:269) at java.nio.Buffer.position(Buffer.java:244) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter$SortSpan.<init>(PipelinedSorter.java:952) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.sort(PipelinedSorter.java:361) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:434) at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:390) at org.apache.tez.runtime.library.common.sort.impl.TestPipelinedSorter.testWithLargeRecordAndLowMemory(TestPipelinedSorter.java:878) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)
okay, thanks for clarifying, in this case, problem is on my side :) agree, no matter if we call this revert or not, fixed the overflow issue thanks for that!
+1