ruby-spark icon indicating copy to clipboard operation
ruby-spark copied to clipboard

Support for dataframes

Open gnilrets opened this issue 9 years ago • 5 comments

I'm really interested in using spark and would love to be able to interact with it using Ruby. This gem looks like a great option. It doesn't look like it would natively support spark dataframes, right? Would there be any way to interact with dataframes using this gem? If not, what kind of effort would you expect would be required to build it in?

gnilrets avatar May 22 '15 03:05 gnilrets

The project is still in its alpha stage, basically it is a only a proof of concept. We've tried using Spark API from Ruby and it works! :) Currently we support only a subset of Spark API functionality. We'd like to attract more developers and extend supported functions. Right now ruby-spark runs better on MRI than JRuby (this might change with JRuby 9.0.0.0 release).

ruby-spark interacts with JVM (Scala backend), almost anything that is possible in Python, we should be able to do in Ruby. Have a look here on some benchmarks: http://ondra-m.github.io/ruby-spark/

deric avatar May 22 '15 08:05 deric

DataFrame is part of Spark SQL which is in TODO.

ondra-m avatar May 22 '15 10:05 ondra-m

This is the kind of project I could be interested in contributing to. If you had to put a rough estimate on the number of developer hours you think it would take to build in Spark SQL support, what would it be?

gnilrets avatar May 22 '15 15:05 gnilrets

SQL implementation will take a long time. Currently there are more important things to do (some RDD and Mllib methods are missing, beter Proc serialization, ...).

ondra-m avatar May 22 '15 15:05 ondra-m

This is a really great gem and DataFrames will be the foundation to attract great projects for this gem. Could I help your development on the documentation?

xjlin0 avatar Jun 19 '15 04:06 xjlin0