BigInsights-on-Apache-Hadoop
BigInsights-on-Apache-Hadoop copied to clipboard
Speed up Oozie Spark example
In order for oozie spark job to run in Yarn we need the spark-assembly.jar to be in job path. Right now we get the jar for the cluster (webhdfs) and then put (webhdfs) it into the $jobDir/lib directory. This takes over few minutes.
Another way would be too have the lib in the oozie shared lib directory by default.
As oozie, you can do:
# Copy spark-assembly jar to Oozie shared lib directory
hdfs dfs -put /usr/iop/current/spark-client/lib/spark-assembly.jar /user/oozie/share/lib/lib_20160805191701/spark/.
# Set oozie environment
source /usr/iop/current/oozie-client/bin/oozie-env.sh
export OOZIE_URL=http://<replace with oozie node>:11000/oozie
# Update shared lib
oozie admin -sharelibupdate
Once this is done, there is no need to put the jar under $jobDir/lib as it will be automatically picked from the oozie shared lib.
This looks good Pierre. Would these steps fo into a new task called something like Setup that the user would just run once with gradle?
Will it also work on basic clusters?
@snowch need to look into this more closely as I believe you would need to become oozie user to do this (so need root). Same is true for basic.
I am also inquiring if this could become default though so it is there by default in the shared lib to start with.
Ah, cool. Thanks @pregazzoni