tez
tez copied to clipboard
TEZ-4285: Drop Commons Codec Dependency
:broken_heart: -1 overall
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
+0 :ok: | reexec | 16m 40s | Docker mode activated. |
_ Prechecks _ | |||
+1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
+1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
+1 :green_heart: | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
_ master Compile Tests _ | |||
+0 :ok: | mvndep | 4m 59s | Maven dependency ordering for branch |
+1 :green_heart: | mvninstall | 8m 35s | master passed |
+1 :green_heart: | compile | 4m 22s | master passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 |
+1 :green_heart: | compile | 3m 53s | master passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
+1 :green_heart: | checkstyle | 2m 40s | master passed |
+1 :green_heart: | javadoc | 3m 57s | master passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 |
+1 :green_heart: | javadoc | 3m 21s | master passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
+0 :ok: | spotbugs | 5m 55s | Used deprecated FindBugs config; considering switching to SpotBugs. |
+0 :ok: | findbugs | 5m 51s | root in master has 2 extant findbugs warnings. |
_ Patch Compile Tests _ | |||
+0 :ok: | mvndep | 0m 13s | Maven dependency ordering for patch |
+1 :green_heart: | mvninstall | 5m 22s | the patch passed |
+1 :green_heart: | compile | 4m 21s | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 |
+1 :green_heart: | javac | 4m 21s | the patch passed |
+1 :green_heart: | compile | 3m 53s | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
+1 :green_heart: | javac | 3m 53s | the patch passed |
+1 :green_heart: | checkstyle | 2m 16s | the patch passed |
+1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 :green_heart: | xml | 0m 4s | The patch has no ill-formed XML file. |
+1 :green_heart: | javadoc | 3m 51s | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 |
+1 :green_heart: | javadoc | 3m 22s | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
+1 :green_heart: | findbugs | 10m 23s | the patch passed |
_ Other Tests _ | |||
+1 :green_heart: | unit | 2m 2s | tez-api in the patch passed. |
+1 :green_heart: | unit | 5m 6s | tez-runtime-library in the patch passed. |
+1 :green_heart: | unit | 4m 15s | tez-dag in the patch passed. |
-1 :x: | unit | 57m 12s | tez-tests in the patch failed. |
-1 :x: | unit | 69m 25s | root in the patch failed. |
+1 :green_heart: | asflicense | 1m 41s | The patch does not generate ASF License warnings. |
234m 3s |
Subsystem | Report/Notes |
---|---|
Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/Dockerfile |
GITHUB PR | https://github.com/apache/tez/pull/125 |
JIRA Issue | TEZ-4285 |
Optional Tests | dupname asflicense javac javadoc unit xml compile spotbugs findbugs checkstyle |
uname | Linux 11f4c517bf83 4.15.0-128-generic #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | personality/tez.sh |
git revision | master / 01847814d |
Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
unit | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/patch-unit-tez-tests.txt |
unit | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/artifact/out/patch-unit-root.txt |
Test Results | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/testReport/ |
Max. process+thread count | 1210 (vs. ulimit of 5500) |
modules | C: tez-api tez-runtime-library tez-dag tez-tests . U: . |
Console output | https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-125/1/console |
versions | git=2.25.1 maven=3.6.3 findbugs=3.0.1 |
Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
@jteagles Just wanted to get your temperature on this change. Thanks!
@belugabehr, In general, it could be a good change. Couple of thing on my mind about this.
I have been thinking about replacing apache base64 due to performance after reading http://java-performance.info/base64-encoding-and-decoding-performance/
and seeing native java8 performance.
However, as Hadoop depends directly on commons-codec, thing won't truly rid tez of the transitive dependency, only the direct dependency.
It's really important to switch from apache to java.util.Base64 in the backwards compatible way The org.apache.commons.codec.binary.Base64 encoder says it returns an RFC 2045 compliant encoder. https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
java.util.Base64.getEncoder() returns an RFC 4648 compliant encoder, which uses the same alphabet an RFC 2045, but without line lengths and line endings. https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html
I haven't checked this with code, but they will need to be exactly the same as, the shuffle handler could be running a different version of tez than the job. So they have to be backwards compatible.
I thought Base64.getMimeEncoder() will give a RFC 2045 compliant encoder in java.util.Base64 (76 line lengths with CRLF endings), but they do not match
@Test
public void testCodecEquivalence() throws Exception {
byte[] bytes = new byte[100];
String enc1 = Base64.getEncoder().encodeToString(bytes);
String enc2 = new String(org.apache.commons.codec.binary.Base64.encodeBase64(bytes));
String enc3 = Base64.getMimeEncoder().encodeToString(bytes);
System.out.println("enc1 = <" + enc1 + ">");
System.out.println("enc2 = <" + enc2 + ">");
System.out.println("enc3 = <" + enc3 + ">");
System.out.println(enc1.equals(enc2));
System.out.println(enc1.equals(enc3));
}
true
false
So perhaps line lengths are skipped with apache and makes it RFC 4648 compatible.
As for the sha384 bit, that's more complicated. I would rather not add more dependency on guava until we are using a shaded version as that library has not shown very good API stability and causes uses issues. That being said, if you look at how the sha384 is being used, it is very much overkill. In DAPAppMaster.isSameFile, two files are being read fully to compute sha384 and then SHAs are compared to see if they are equal. Same for DAG.verifyLocalResources. Perhaps better would be just to walk both file streams and stop when they differ. It will give an earlier exit for failed case. Sha384 seems just an added complication. What do you think?