tez icon indicating copy to clipboard operation
tez copied to clipboard

TEZ-4635: Hadoop doesn't bring BouncyCastle to tez tar since 3.4

Open abstractdog opened this issue 5 months ago • 5 comments

abstractdog avatar Jun 17 '25 11:06 abstractdog

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 27m 5s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: xmllint 0m 0s xmllint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 :green_heart: mvninstall 11m 18s master passed
+1 :green_heart: compile 2m 24s master passed
+1 :green_heart: javadoc 1m 34s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 4m 50s the patch passed
+1 :green_heart: codespell 1m 2s No new issues.
+1 :green_heart: compile 2m 25s the patch passed
+1 :green_heart: javac 2m 25s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: javadoc 1m 14s the patch passed
_ Other Tests _
-1 :x: unit 70m 49s /patch-unit-root.txt root in the patch passed.
+1 :green_heart: asflicense 0m 37s The patch does not generate ASF License warnings.
124m 53s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/419
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile
uname Linux 2022bf42158d 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/tez-multibranch_PR-419/src/.yetus/personality.sh
git revision master / 43a4f68bb478090638c5d8ce5bada1e1b07601fa
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu122.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/1/testReport/
Max. process+thread count 1391 (vs. ulimit of 5500)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/1/console
versions git=2.34.1 maven=3.6.3 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jun 17 '25 13:06 tez-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: xmllint 0m 0s xmllint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 :ok: mvndep 2m 46s Maven dependency ordering for branch
+1 :green_heart: mvninstall 9m 58s master passed
+1 :green_heart: compile 3m 2s master passed
+1 :green_heart: javadoc 2m 1s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 13s Maven dependency ordering for patch
+1 :green_heart: mvninstall 5m 12s the patch passed
+1 :green_heart: codespell 1m 2s No new issues.
+1 :green_heart: compile 3m 2s the patch passed
+1 :green_heart: javac 3m 2s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: javadoc 1m 54s the patch passed
_ Other Tests _
+1 :green_heart: unit 2m 27s tez-api in the patch passed.
-1 :x: unit 72m 27s /patch-unit-root.txt root in the patch passed.
+1 :green_heart: asflicense 1m 3s The patch does not generate ASF License warnings.
107m 23s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/419
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile
uname Linux 85c61b687a83 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/tez-multibranch_PR-419/src/.yetus/personality.sh
git revision master / 82e8364e93a6ce900b9b2eaca98f9c0708b62259
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu122.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/2/testReport/
Max. process+thread count 2111 (vs. ulimit of 5500)
modules C: tez-api . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/2/console
versions git=2.34.1 maven=3.6.3 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jun 19 '25 10:06 tez-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 28m 5s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: xmllint 0m 0s xmllint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+0 :ok: mvndep 2m 30s Maven dependency ordering for branch
+1 :green_heart: mvninstall 9m 28s master passed
+1 :green_heart: compile 2m 56s master passed
+1 :green_heart: javadoc 1m 56s master passed
_ Patch Compile Tests _
+0 :ok: mvndep 0m 13s Maven dependency ordering for patch
+1 :green_heart: mvninstall 5m 7s the patch passed
+1 :green_heart: codespell 1m 2s No new issues.
+1 :green_heart: compile 2m 58s the patch passed
+1 :green_heart: javac 2m 58s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: javadoc 1m 48s the patch passed
_ Other Tests _
+1 :green_heart: unit 2m 25s tez-api in the patch passed.
-1 :x: unit 69m 41s /patch-unit-root.txt root in the patch passed.
+1 :green_heart: asflicense 1m 2s The patch does not generate ASF License warnings.
130m 51s
Reason Tests
Failed junit tests tez.analyzer.TestAnalyzer
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/419
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile
uname Linux e19398ed7755 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/tez-multibranch_PR-419/src/.yetus/personality.sh
git revision master / 906d07cc656ed2ac4762f1d06382842b9b6a9713
Default Java Ubuntu-21.0.7+6-Ubuntu-0ubuntu124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/3/testReport/
Max. process+thread count 1333 (vs. ulimit of 5500)
modules C: tez-api . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/3/console
versions git=2.43.0 maven=3.8.7 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jun 20 '25 10:06 tez-yetus

@abstractdog do you have pointers like which ticket removed it in Hadoop, I was checking the history I found only tickets around upgrade

ayushtkn avatar Jun 20 '25 14:06 ayushtkn

@abstractdog do you have pointers like which ticket removed it in Hadoop, I was checking the history I found only tickets around upgrade

good point, I need to check this

abstractdog avatar Jun 20 '25 14:06 abstractdog

weird things, here is what I got as verbose dependency trees, full examples attached to jira (extracted only bcprov parts for clarity)

hadoop 3.4.0 (hadoop-common)

[INFO] +- org.bouncycastle:bcprov-jdk15on:jar:1.70:compile

tez on hadoop 3.4.0 (tez-api)

[INFO] +- org.apache.hadoop:hadoop-common:jar:3.4.0:compile
[INFO] |  +- org.bouncycastle:bcprov-jdk15on:jar:1.70:compile

[INFO] +- org.apache.hadoop:hadoop-common:test-jar:tests:3.4.0:test
[INFO] |  +- (org.bouncycastle:bcprov-jdk15on:jar:1.70:test - omitted for duplicate)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78:test

hadoop 3.4.1 (hadoop-common)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78.1:compile

tez on hadoop 3.4.1 (tez-api)

[INFO] +- org.apache.hadoop:hadoop-common:jar:3.4.1:compile
[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)


[INFO] +- org.apache.hadoop:hadoop-common:test-jar:tests:3.4.1:test
[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)

[INFO] +- org.bouncycastle:bcprov-jdk18on:jar:1.78:test

so apparently, when depending on hadoop 3.4.1 (where the bcprov compile scope dependency looks good in hadoop project itself: org.bouncycastle:bcprov-jdk18on:jar:1.78.1:compile), the compile time dependency (instead of being brought as a compile-time dependency in tez), get ommited due to a totally confusing message:

[INFO] |  +- (org.bouncycastle:bcprov-jdk18on:jar:1.78:test - version managed from 1.78.1; scope managed from compile; omitted for duplicate)

what does this mean? bcprov 1.78:test is omitted because the version is managed from 1.78.1 (which is the version defined in hadoop), but at the same time, I cannot see a proper 1.78.1 compile time dependency or something, this just doesn't make sense to me

I’ve put in 1–2 hours of investigation so far, and I'm not sure how much more time it's worth, so here’s what we can do:

  1. merge this change and accept tez bringing this dependency with its own ${bouncycastle.version} compile time
  2. understand what happened here and still let hadoop bring its ${bouncycastle.version}

maybe do 1) now, and follow-up 2) later

abstractdog avatar Jun 30 '25 17:06 abstractdog

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom. I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope.

Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync.

I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier?

I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly.

However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

ayushtkn avatar Jul 02 '25 04:07 ayushtkn

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom. I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope.

Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync.

I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier?

I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly.

However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

thanks @ayushtkn , absolutely makes sense

what's weird is that upstream tez, the fix works also without the version harmonization, so only changing the root pom.xml however, I need to consider 2 things:

  1. version harmonization makes sense, I'll most probably add it
  2. this fix doesn't work downstream, only if I change the tez-api pom.xml too...I need to understand the difference before proceeding here

I'll keep you posted

abstractdog avatar Jul 02 '25 08:07 abstractdog

Thanx @abstractdog for the details. I think maybe the problem is we are forcing the scope to test in dependencyManagement of the parent pom. I believe if you just remove the scope from the parent pom, that should do. The other modules can have it in test scope. Moreover the version of bouncycastle from hadoop seems to be 1.78.1 where in Tez it is 1.78. Maybe we should keep them in sync. I believe something like this might just do

diff --git a/pom.xml b/pom.xml
index 8dfdec9ec..d1031c6c4 100644
--- a/pom.xml
+++ b/pom.xml
@@ -58,7 +58,7 @@
 
     <!--dependency versions in alphabetical order-->
     <asynchttpclient.version>2.12.4</asynchttpclient.version>
-    <bouncycastle.version>1.78</bouncycastle.version>
+    <bouncycastle.version>1.78.1</bouncycastle.version>
     <build-helper-maven-plugin.version>1.8</build-helper-maven-plugin.version>
     <buildnumber-maven-plugin.version>1.1</buildnumber-maven-plugin.version>
     <checkstyle.version>8.35</checkstyle.version>
@@ -791,13 +791,11 @@
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcprov-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.bouncycastle</groupId>
         <artifactId>bcpkix-jdk18on</artifactId>
         <version>${bouncycastle.version}</version>
-        <scope>test</scope>
       </dependency>
       <dependency>
         <groupId>org.fusesource.leveldbjni</groupId>

What changed, and why was it working earlier? I have a theory related to HADOOP-19024, which was introduced in Hadoop 3.4.1. Previously, Hadoop depended on bcprov-jdk15on, whereas Tez declared bcprov-jdk18on in its dependencyManagement. Since Tez did not explicitly declare bcprov-jdk15on, it was being pulled transitively from Hadoop with its default (compile) scope, so it ended up being packaged correctly. However, after HADOOP-19024, Hadoop itself now declares bcprov-jdk18on, which conflicts with Tez’s declaration that forces its scope to test. As a result, the dependency is omitted from the final package.

thanks @ayushtkn , absolutely makes sense

what's weird is that upstream tez, the fix works also without the version harmonization, so only changing the root pom.xml however, I need to consider 2 things:

  1. version harmonization makes sense, I'll most probably add it
  2. this fix doesn't work downstream, only if I change the tez-api pom.xml too...I need to understand the difference before proceeding here

I'll keep you posted

okay, after 4-5 hours I just figured out that the downstream version was missing TEZ-4266, leading very different plugin versions...TLDR: did everything to synchronize the pom structure, still downstream bcprov jar didn't appear in the dist package, after applying TEZ-4266 it magically started to work: I'm 99% sure that the old maven assembly plugin was the one to blame

so I believe this PR could stay with a simple scope change in the root pom.xml + version bump

abstractdog avatar Jul 02 '25 14:07 abstractdog

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 0s Docker mode activated.
-1 :x: docker 5m 51s Docker failed to build run-specific yetus/tez:tp-15228}.
Subsystem Report/Notes
GITHUB PR https://github.com/apache/tez/pull/419
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/5/console
versions git=2.34.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jul 02 '25 14:07 tez-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 0s Docker mode activated.
-1 :x: docker 20m 10s Docker failed to build run-specific yetus/tez:tp-32760}.
Subsystem Report/Notes
GITHUB PR https://github.com/apache/tez/pull/419
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/4/console
versions git=2.34.1
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jul 02 '25 14:07 tez-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 28m 0s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: xmllint 0m 0s xmllint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 :green_heart: mvninstall 11m 11s master passed
+1 :green_heart: compile 2m 21s master passed
+1 :green_heart: javadoc 1m 27s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 4m 41s the patch passed
+1 :green_heart: codespell 1m 0s No new issues.
+1 :green_heart: compile 2m 22s the patch passed
+1 :green_heart: javac 2m 22s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: javadoc 1m 11s the patch passed
_ Other Tests _
+1 :green_heart: unit 68m 52s root in the patch passed.
+1 :green_heart: asflicense 0m 38s The patch does not generate ASF License warnings.
123m 16s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/tez/pull/419
Optional Tests dupname asflicense javac javadoc unit codespell detsecrets xmllint compile
uname Linux 5749504c4ae8 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality /home/jenkins/jenkins-agent/workspace/tez-multibranch_PR-419/src/.yetus/personality.sh
git revision master / d1624950b04594e728ac10afdb28fc75fabe3de5
Default Java Ubuntu-21.0.7+6-Ubuntu-0ubuntu124.04
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/testReport/
Max. process+thread count 1273 (vs. ulimit of 5500)
modules C: . U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-419/6/console
versions git=2.43.0 maven=3.8.7 codespell=2.0.0
Powered by Apache Yetus 0.15.1 https://yetus.apache.org

This message was automatically generated.

tez-yetus avatar Jul 02 '25 18:07 tez-yetus