zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-3949] Add HADOOP_CLASSPATH in ZEPPELIN_CLASSPATH when starting Zeppelin

Open mcapuccini opened this issue 5 years ago • 6 comments

What is this PR for?

Add HADOOP_CLASSPATH to ZEPPELIN_CLASSPATH when starting Zeppelin. This adds some important deps that Zeppelin may need. For instance when storing the notebooks in Swift (an Hadoop-compatible storage system) Zeppelin need to use a driver that is generally included in HADOOP_CLASSPATH, but not in HADOOP_CONF_DIR.

What type of PR is it?

Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-3949

How should this be tested?

The Travis build should run just fine.

Questions:

  • Does the licenses files need update? Nope.

  • Is there breaking changes for older versions? Nope.

  • Does this needs documentation? Nope. Adding the HADOOP_CLASSPATH would be the expected behaviour.

mcapuccini avatar Jan 15 '19 13:01 mcapuccini

Hi @felixcheung. I've tried to set CLASSPATH=$CLASSPATH:$HADOOP_CLASSPATH before running bin/zeppelin.sh, but that causes Zeppelin to crash (I was getting errors like "class not found"). Setting ZEPPELIN_CLASSPATH worked just fine for me.

mcapuccini avatar Jan 16 '19 08:01 mcapuccini

Just copy my comment in jira here.

It is risky to put hadoop libraries on the CLASSPATH of zeppelin server. Because it may cause jar conflicts. For notebook storage, we use plugin for NotebookRepo in zeppelin 0.9, you can put extra dependencies in the plugin folder. Hadoop NotebookRepo is located in $ZEPPELIN_HOME/plugins/NotebookRepo/FileSystemNotebookRepo

zjffdu avatar Jan 16 '19 08:01 zjffdu

@zjffdu Indeed, adding Hadoop JARs to CLASSPATH caused conflicts in my environment, but adding them to ZEPPELIN_CLASSPATH seems to work good (https://github.com/mcapuccini/spark-tensorflow/blob/master/Dockerfile). Ins't $ZEPPELIN_HOME/plugins/NotebookRepo/FileSystemNotebookRepo added to the ZEPPELIN_CLASSPATH ultimately?

mcapuccini avatar Jan 16 '19 08:01 mcapuccini

@mcapuccini Plugin use separate ClassLoader which is different from that of ZeppelinServer.

zjffdu avatar Jan 16 '19 09:01 zjffdu

I think you should try to use ZEPPELIN_CLASSPATH_OVERRIDES instead of ZEPPELIN_CLASSPATH(on your own risk) As @zjffdu said, additional jars in base classpath may cause jar conflicts and troubles in runtime.

D01B avatar Jan 17 '19 06:01 D01B

Ok! I'll try and I'll let you know. It would be nice though to to have it set automatically by the startup script. Zeppelin is an Apache project, and I would expect HADOOP_CLASSPATH to be added automatically like it happens for Spark.

mcapuccini avatar Jan 17 '19 08:01 mcapuccini