ruby-spark icon indicating copy to clipboard operation
ruby-spark copied to clipboard

Maybe a better implementation of ruby binding for Apache Spark

Open chyh1990 opened this issue 9 years ago • 2 comments

Hi,

I have written a new prototype for ruby spark binding

https://github.com/chyh1990/jruby-spark

Although this implementation only works on JRuby, I think this approach is more promising:

  • REAL closure/lambda serialization, with elegant syntax

https://github.com/chyh1990/jruby-spark/blob/master/examples/pagerank.rb

  • use JVM infrastructure, run on YARN with the standard job submission workflow
  • reuse Java/Scala API, we can get Streaming/SQL/GraphX support nearly for free

https://github.com/chyh1990/jruby-spark/blob/master/examples/sqltest.rb

  • Easier to maintain even without merging into mainline spark

The prototype is preliminary, but the concept is proved. I think ruby would be a more elegant binding language for spark than python. I'm looking forward for more participants!

chyh1990 avatar Apr 12 '16 06:04 chyh1990

Do you have some install guide?

I tried rake package but go:

jruby-spark/src/main/scala/org/apache/spark/jruby/JRubyIteratableAdaptor.scala:6: error: object jruby is not a member of package org
...

ondra-m avatar Apr 15 '16 13:04 ondra-m

It's unfortunate that @chyh1990 is not very responsive right now, but I was able to get a spark session going after doing a few extra things - https://github.com/chyh1990/jruby-spark/issues/1

gnilrets avatar Apr 20 '16 02:04 gnilrets