parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-2173. Fix parquet build against hadoop 3.3.3+

Open steveloughran opened this issue 3 years ago • 1 comments

Hadoop 3.3.3 moved to reload4j for logging to stop shipping a version of log4j with known (albeit unused) CVEs.

This bypasses the existing exclusion code used to keep hadoop's SLF4J dependency off the classpaths, and by adding a new jar, breaks parquet-cli build.

Make sure you have checked all steps below.

Jira

  • [X] My PR addresses the following Parquet Jira issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR"
    • https://issues.apache.org/jira/browse/PARQUET-XXX
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Tests

  • [X] My PR adds the following unit tests OR does not need testing for this extremely good reason:

The testing is regression testing "does the build work?", "does a test run complete without SLF4J warnings of duplicates?". done manually with -Dhadoop.version=3.3.4

Commits

  • [X] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [ ] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

steveloughran avatar Aug 16 '22 19:08 steveloughran

i've also built against the next release of hadoop, and of 3.4.0-SNAPSHOT.

the parquet build fails there as jackson 1 is purged from the hadoop classpath, breaking the japicmp plugin.

Execution default of goal com.github.siom79.japicmp:japicmp-maven-plugin:0.14.2:cmp failed: Could not load 'org.codehaus.jackson.type.TypeReference

steveloughran avatar Aug 16 '22 20:08 steveloughran

cc @ggershinsky @wgtmac let me know if you have concern to merge.

shangxinli avatar Dec 03 '22 18:12 shangxinli

cc @ggershinsky @wgtmac let me know if you have concern to merge.

Thanks for pinging me! I don't have any concern for now.

wgtmac avatar Dec 06 '22 09:12 wgtmac

be good to get this in. FWIW i've been trying to build lots of things with the current smoke-build of a hadoop 3.3.5 RC. There's some aspects of maven playing up and I can't get parquet to collect the JARs from the asf staging repo, even with a profile in ~/.m2/settings.xml for it. I will probably have to add an explicit profile for that in the parquet build.

steveloughran avatar Dec 06 '22 09:12 steveloughran

cc @ggershinsky @wgtmac let me know if you have concern to merge.

Thanks for pinging me! I don't have any concern for now.

Same here

ggershinsky avatar Dec 06 '22 12:12 ggershinsky

any plans to merge now?

steveloughran avatar Jan 31 '23 10:01 steveloughran

It looks good to me but I don't have the privilege to merge.

May I request your help? @ggershinsky @shangxinli @gszadovszky

wgtmac avatar Feb 01 '23 02:02 wgtmac

thanks; closed the jira

steveloughran avatar Feb 02 '23 11:02 steveloughran