BigInsights-on-Apache-Hadoop icon indicating copy to clipboard operation
BigInsights-on-Apache-Hadoop copied to clipboard

Speed up Oozie Spark example

Open pregazzoni opened this issue 9 years ago • 3 comments
trafficstars

In order for oozie spark job to run in Yarn we need the spark-assembly.jar to be in job path. Right now we get the jar for the cluster (webhdfs) and then put (webhdfs) it into the $jobDir/lib directory. This takes over few minutes.

Another way would be too have the lib in the oozie shared lib directory by default.

As oozie, you can do:

# Copy spark-assembly jar to Oozie shared lib directory
hdfs dfs -put /usr/iop/current/spark-client/lib/spark-assembly.jar /user/oozie/share/lib/lib_20160805191701/spark/.

# Set oozie environment
source /usr/iop/current/oozie-client/bin/oozie-env.sh
export OOZIE_URL=http://<replace with oozie node>:11000/oozie

# Update shared lib
oozie admin -sharelibupdate

Once this is done, there is no need to put the jar under $jobDir/lib as it will be automatically picked from the oozie shared lib.

pregazzoni avatar Aug 12 '16 16:08 pregazzoni

This looks good Pierre. Would these steps fo into a new task called something like Setup that the user would just run once with gradle?

Will it also work on basic clusters?

snowch avatar Aug 17 '16 01:08 snowch

@snowch need to look into this more closely as I believe you would need to become oozie user to do this (so need root). Same is true for basic.

I am also inquiring if this could become default though so it is there by default in the shared lib to start with.

pregazzoni avatar Aug 18 '16 00:08 pregazzoni

Ah, cool. Thanks @pregazzoni

snowch avatar Aug 18 '16 20:08 snowch