zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-5006] Spark-Yarn docker (bugfix): downgrade CentOS, upgrade Java

Open olegchir opened this issue 5 years ago • 3 comments
trafficstars

What is this PR for?

Now we have an updated container. Unfortunately, dependencies are broken:

  • We can't upgrade to CentOS 7, because scripts are incompatible with systemd
  • New Spark requires JDK 8
  • Java is located on a different fs path

This pull request fixes all these things

What type of PR is it?

Bug Fix

Todos

Self-contained

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-5006

How should this be tested?

Try to manually build and run the spark-yarn dockerfile.

In the current version of the dockerfile builds ok, but fails immediately after starting: You can check it by inspecting Spark and sshd - they are not running.

After this pull request everything is OK.

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? - NO
  • Is there breaking changes for older versions? - NO
  • Does this needs documentation? - NO

olegchir avatar Aug 19 '20 18:08 olegchir

I think downgrading to CentOS 6 is not an option. Checkout the End of Lifetime page. We should rather prefer an upgrade to centos 8.

I think the ssh part is causing problems, right?

Reamer avatar Aug 20 '20 13:08 Reamer

@Reamer yes, ssh part is causing problems.

Do we even need it there? I think it's considered a bad practice to have an SSH server inside a end-user docker image. Or maybe it's worth to run SSH daemon manually, without any systemd or initscripts.

I'll test CentOS 8 and come back a bit later.

Also:

  • I have a bunch of refactorings for Docker (architecture, performance), but it's a separate thing for a separate PR
  • I'll try to change the base branch to 0.9 instead of the master to simplify migrations later. Never changed a base branch on GH in the web UI... it's a surprising experience :)

olegchir avatar Aug 21 '20 06:08 olegchir

You are right, running SSH daemon in a docker container is a very bad practice.

I'm not sure that's necessary. After reading the documentation spark-on-yarn-mode, it seems that only one container is started.

Maybe you just try it :-D

I have a bunch of refactorings for Docker (architecture, performance), but it's a separate thing for a separate PR

I can't wait to see it.

Reamer avatar Aug 21 '20 07:08 Reamer