scalding
scalding copied to clipboard
Add scalding-parquet to the repl classpath
Making the repl a little more useful out of the box.
The build failed here. In general this approach seems flawed. Avro shouldn't be in the defaults either, can we make these command line args to the repl?
Odd. I don't see how my change resulted in those errors, one is maven not being able to download an Hbase jar and the other is a scalac error. All I'm doing is passing a flag to scald.rb, one which it already accepts, nothing more.
In general I'm torn on what is the better approach.
On one hand it would be nice if the repl allowed for as much as possible without having to struggle with imports and adjusting class paths manually.
On the other hand, if the repl could be started with command line flags which could pass through to scald.rb, perhaps that would be enough.
I re-ran my branch with "./scripts/test_tutorials.sh" and it ran successfully.
Can we have the repl take the command line flags and just pass them through? So users won't have bigger class paths/builds than they need? (It seems like we would be undoing the modular nature of scaldrb to just add them all into the repl)
We may be able to add a --tool option to scald.rb, and just call it from the scald-repl.sh as:
./scripts/scald.rb --tool com.twitter.scalding.ScaldingShell $@ -Yrepl-sync
And scald.rb can figure out the classpath based on the user parameters.
I like that approach, as it allows for maximum flexibility, and doesn't bake in any assumptions. Then perhaps support for something like a .scald-repl.rc file could be setup so that users can build up a common environment. I'm envisioning a use case where an analyst wishes to work directly in the scalding repl most of the time (and when Scalding jobs can be run on a cluster right from the repl this will be even more likely).
@hellertime you can run jobs on a cluster now via the REPL. Just use the --hdfs option. It runs the hadoop command - so the job will run on whatever cluster the hadoop CLI can talk to.
Do you want to add the --tool option, or do you want me to? Should be relatively straightforward. That would also make the scald-repl.sh significantly less hacky.
As long as no one has started on adding the --tool option, I'm willing to take a crack at it.
I'll be fine with that. I am preoccupied with a product release here in the day-job, so I cannot give the task justice at the moment.
On Thu, Jun 12, 2014 at 2:24 PM, Brandon Holt [email protected] wrote:
As long as no one has started on adding the --tool option, I'm willing to take a crack at it.
— Reply to this email directly or view it on GitHub https://github.com/twitter/scalding/pull/888#issuecomment-45929176.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.