zeppelin
zeppelin copied to clipboard
[ZEPPELIN-5006] Spark-Yarn docker (bugfix): downgrade CentOS, upgrade Java
What is this PR for?
Now we have an updated container. Unfortunately, dependencies are broken:
- We can't upgrade to CentOS 7, because scripts are incompatible with systemd
- New Spark requires JDK 8
- Java is located on a different fs path
This pull request fixes all these things
What type of PR is it?
Bug Fix
Todos
Self-contained
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-5006
How should this be tested?
Try to manually build and run the spark-yarn dockerfile.
In the current version of the dockerfile builds ok, but fails immediately after starting: You can check it by inspecting Spark and sshd - they are not running.
After this pull request everything is OK.
Screenshots (if appropriate)
Questions:
- Does the licenses files need update? - NO
- Is there breaking changes for older versions? - NO
- Does this needs documentation? - NO
I think downgrading to CentOS 6 is not an option. Checkout the End of Lifetime page. We should rather prefer an upgrade to centos 8.
I think the ssh part is causing problems, right?
@Reamer yes, ssh part is causing problems.
Do we even need it there? I think it's considered a bad practice to have an SSH server inside a end-user docker image. Or maybe it's worth to run SSH daemon manually, without any systemd or initscripts.
I'll test CentOS 8 and come back a bit later.
Also:
- I have a bunch of refactorings for Docker (architecture, performance), but it's a separate thing for a separate PR
- I'll try to change the base branch to 0.9 instead of the master to simplify migrations later. Never changed a base branch on GH in the web UI... it's a surprising experience :)
You are right, running SSH daemon in a docker container is a very bad practice.
I'm not sure that's necessary. After reading the documentation spark-on-yarn-mode, it seems that only one container is started.
Maybe you just try it :-D
I have a bunch of refactorings for Docker (architecture, performance), but it's a separate thing for a separate PR
I can't wait to see it.