ruby-spark
ruby-spark copied to clipboard
Maybe a better implementation of ruby binding for Apache Spark
Hi,
I have written a new prototype for ruby spark binding
https://github.com/chyh1990/jruby-spark
Although this implementation only works on JRuby, I think this approach is more promising:
- REAL closure/lambda serialization, with elegant syntax
https://github.com/chyh1990/jruby-spark/blob/master/examples/pagerank.rb
- use JVM infrastructure, run on YARN with the standard job submission workflow
- reuse Java/Scala API, we can get Streaming/SQL/GraphX support nearly for free
https://github.com/chyh1990/jruby-spark/blob/master/examples/sqltest.rb
- Easier to maintain even without merging into mainline spark
The prototype is preliminary, but the concept is proved. I think ruby would be a more elegant binding language for spark than python. I'm looking forward for more participants!
Do you have some install guide?
I tried rake package but go:
jruby-spark/src/main/scala/org/apache/spark/jruby/JRubyIteratableAdaptor.scala:6: error: object jruby is not a member of package org
...
It's unfortunate that @chyh1990 is not very responsive right now, but I was able to get a spark session going after doing a few extra things - https://github.com/chyh1990/jruby-spark/issues/1