scalding icon indicating copy to clipboard operation
scalding copied to clipboard

Add scalding-parquet to the repl classpath

Open hellertime opened this issue 10 years ago • 10 comments

Making the repl a little more useful out of the box.

hellertime avatar Jun 04 '14 19:06 hellertime

The build failed here. In general this approach seems flawed. Avro shouldn't be in the defaults either, can we make these command line args to the repl?

ianoc avatar Jun 04 '14 20:06 ianoc

Odd. I don't see how my change resulted in those errors, one is maven not being able to download an Hbase jar and the other is a scalac error. All I'm doing is passing a flag to scald.rb, one which it already accepts, nothing more.

In general I'm torn on what is the better approach.

On one hand it would be nice if the repl allowed for as much as possible without having to struggle with imports and adjusting class paths manually.

On the other hand, if the repl could be started with command line flags which could pass through to scald.rb, perhaps that would be enough.

hellertime avatar Jun 04 '14 20:06 hellertime

I re-ran my branch with "./scripts/test_tutorials.sh" and it ran successfully.

hellertime avatar Jun 05 '14 01:06 hellertime

Can we have the repl take the command line flags and just pass them through? So users won't have bigger class paths/builds than they need? (It seems like we would be undoing the modular nature of scaldrb to just add them all into the repl)

ianoc avatar Jun 05 '14 03:06 ianoc

We may be able to add a --tool option to scald.rb, and just call it from the scald-repl.sh as:

./scripts/scald.rb --tool com.twitter.scalding.ScaldingShell $@ -Yrepl-sync

And scald.rb can figure out the classpath based on the user parameters.

sriramkrishnan avatar Jun 05 '14 05:06 sriramkrishnan

I like that approach, as it allows for maximum flexibility, and doesn't bake in any assumptions. Then perhaps support for something like a .scald-repl.rc file could be setup so that users can build up a common environment. I'm envisioning a use case where an analyst wishes to work directly in the scalding repl most of the time (and when Scalding jobs can be run on a cluster right from the repl this will be even more likely).

hellertime avatar Jun 05 '14 13:06 hellertime

@hellertime you can run jobs on a cluster now via the REPL. Just use the --hdfs option. It runs the hadoop command - so the job will run on whatever cluster the hadoop CLI can talk to.

Do you want to add the --tool option, or do you want me to? Should be relatively straightforward. That would also make the scald-repl.sh significantly less hacky.

sriramkrishnan avatar Jun 06 '14 22:06 sriramkrishnan

As long as no one has started on adding the --tool option, I'm willing to take a crack at it.

bholt avatar Jun 12 '14 18:06 bholt

I'll be fine with that. I am preoccupied with a product release here in the day-job, so I cannot give the task justice at the moment.

On Thu, Jun 12, 2014 at 2:24 PM, Brandon Holt [email protected] wrote:

As long as no one has started on adding the --tool option, I'm willing to take a crack at it.

— Reply to this email directly or view it on GitHub https://github.com/twitter/scalding/pull/888#issuecomment-45929176.

hellertime avatar Jun 12 '14 18:06 hellertime

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant