zeppelin
zeppelin copied to clipboard
DO-N0T-MERGE Move to Hadoop3
What is this PR for?
This PR is a PoC of moving all modules of Zeppelin to Hadoop3
It is based on https://github.com/apache/zeppelin/pull/4674, and fixes Flink's testing.
What type of PR is it?
Improvement
Todos
- [x] - Fix Flink Hadoop3 tests
- [ ] - Split it into several small PRs
What is the Jira issue?
- Open an issue on Jira https://issues.apache.org/jira/browse/ZEPPELIN/
- Put link here, and add [ZEPPELIN-Jira number] in PR title, eg. [ZEPPELIN-533]
How should this be tested?
- Strongly recommended: add automated unit tests for any new or changed behavior
- Outline any manual steps to test the PR here.
Screenshots (if appropriate)
Questions:
- Does the license files need to update? No
- Is there breaking changes for older versions? Yes
- Does this needs documentation? Yes
I basically have fixed all compile and test issues, the next step is to split it into several small PRs to speed up the review process.
I think we should start with the interpreter modules one by one, and then zengine, server and other modules, eventually dropping the hadoop2 profile and updating docs.
@Reamer could you give some advice?
I would prefer a larger PR. Where individual tasks are contained in commits. It was clear that the drop of Hadoop2 is very large. Thank you for your work so far.
I think it's great that you have deleted all the excludes in the parent pom.xml, that makes the file much more readable.
Btw. I do not insist on co-authorship.
unfortunately, I found the IT does not run properly now, see https://github.com/apache/zeppelin/pull/4699, we may need to postpone this PR after recovering IT
@Reamer it's ready for review, please take a look when you have time
The Python 3.8 test failure should be addressed in #4748
@Reamer all failed tests are known flaky tests, this patch should be good to go :)
I will merge the pull request on Wednesday as long as no further comments are received.
Could this change break the build? I try to collect and get an error
WARN [2024-05-23 17:04:06,762] ({main} WebAppContext.java[doStart]:533) - Failed startup of context o.e.j.w.WebAppContext@4816278d{/,jar:file:///opt/zeppelin/zeppelin-web-0.12.0-SNAPSHOT.war!/,STOPPED}{/opt/zeppelin/zeppelin-web-0.12.0-SNAPSHOT.war} java.io.FileNotFoundException: JAR entry WEB-INF/lib/hadoop-client-api-3.3.6.jar!/ not found in /opt/zeppelin/zeppelin-web-0.12.0-SNAPSHOT.war
@Armadik mind providing a reproducible step? e.g. build command, start command, OS platform, JDK version, etc.
I see an error when running the zeppelin.sh script
Ubuntu 22.04.4 LTS `apt update
apt install -y curl git maven openjdk-11-jdk npm libfontconfig r-base-dev r-cran-evaluate
wget https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.6.3/apache-maven-3.6.3-bin.tar.gz
sudo tar -zxf apache-maven-3.6.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.6.3/bin/mvn /usr/local/bin/mvn
cd Documents/
git clone https://github.com/apache/zeppelin.git
cd zeppelin/
export MAVEN_OPTS="-Xms1024M -Xmx4096M -XX:MaxMetaspaceSize=1024m -XX:-UseGCOverheadLimit -Dorg.slf4j.simpleLogger.log.org.apache.maven.cli.transfer.Slf4jMavenTransferListener=war"
./mvnw -B package -DskipTests -Pbuild-distr -Pspark-3.3 -Pinclude-hadoop -Phadoop3 -Pspark-scala-2.12 -Pweb-angular -Pweb-dist -pl '!groovy,!submarine,!flink,!cassandra,!jdbc,!bigquery,!alluxio,!mongodb,!neo4j' -am --no-transfer-progress `
] Copying webapp resources [/home/micha/Documents/zeppelin/zeppelin-web/dist]
[INFO] deleting outdated resource WEB-INF/lib/hadoop-client-api-3.3.6.jar
[INFO] deleting outdated resource WEB-INF/lib/hadoop-client-runtime-3.3.6.jar
[INFO] Building war: /zeppelin/zeppelin-web/target/zeppelin-web-0.12.0-SNAPSHOT.war``
This one doesn't seem to work
The team helped me
zip -d /opt/zeppelin/zeppelin-web-0.12.0-SNAPSHOT.war WEB-INF/lib/*
@Armadik sorry, can not reproduce, both classic and new UI are good on my side.
I tried a clean build. It seems the problem was in my environment(