brushfire
brushfire copied to clipboard
Quick Start example in README.
This PR adds some love to the Quick Start example in README. :)
- Fixed an issue where the SNAPSHOT version in
example/irisdidn't match what was inversion.sbt. - Fixed an issue where
org.apache.hadoopwas excluded from runtime by sbt-assembly, causing aClassNotFoundExceptionto be thrown byexample/iris. Basically thehadoopClientinDeps.scalawas "provided" so it wasn't getting included intarget/scala-2.11/brushfire-scalding-0.7.5-SNAPSHOT-jar-with-dependencies.jarduring builds.
This is great, thanks! I think the reason that hadoopClient was listed as provided is that when you are submitting the assembly jar to a hadoop cluster, the hadoop jars are indeed provided in that execution environment, and it can be problematic to duplicate them. But obviously that's not the case when running locally. I'm not sure what the best way to resolve this is.
Good catch!
The sbt-assembly docs offers a clue on how to resolve this.
If we add this to brushfire-scalding/build.sbt:
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))
runMain in Compile <<= Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run))
Then we can run it locally (with all the dependencies) like this:
$ sbt "brushfireScalding/runMain com.twitter.scalding.Tool com.stripe.brushfire.scalding.IrisJob --local --input example/iris.data --output example/iris.output"
Boom!
I wrapped the above command in a new script called quick-start, moved hadoopClient back to provided, and updated the README with examples of running locally and on the cluster.
There might be an even BETTER way to resolve this. Happy to pivot. Tell me what you think. :)
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
eightysteele seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.