Andrei Zhabinski comments

Results 180 comments of


Andrei Zhabinski

Use Apache Arrow for interprocess communication

Although integrating Arrow into existing API may be easy, I believe we need to drop [RDD API](https://github.com/dfdx/Spark.jl/issues/86) and fully migrate to Dataset API first - otherwise we will need to...

Use Apache Arrow for interprocess communication

Currently Julia and JVM communicate in 2 ways: * Julia starts a JVM and calls Java functions via JavaCall. In particular, Julia driver creates Spark application and delegates computations to...

Use Apache Arrow for interprocess communication

More broadly, there are several ways to efficiently bring custom Julia functions to Spark clusters including things like compiling Julia to Java and creating a new distributed computation framework. But...

load_spark_defaults(d::Dict{Any, Any}) causes error

Please see my [comment](https://github.com/dfdx/Spark.jl/issues/106#issuecomment-1193183401) in the other issue if you haven't done it yet. A few more suggestions for further reports: 1. Use triple quotes (```) to format code. See...

Can we abandon RDD API?

A quick update on UDFs. Although a UDF in Spark can be implemented a single class which evaluates its inputs, for UDFs that call external processes a more sophisticated handling...

Can we abandon RDD API?

@Drvi Note that the current version of Spark.jl already supports the most popular DataFrame functions, including `select`, `group_by`, `join`, etc. Other functions are usually easy to add by request. The...

SparkContext giving StackOverflowError

The most likely case is using Spark 3, which we didn't test against yet. Can you try it with Spark 2.4?

SparkContext giving StackOverflowError

I'm using Julia 1.5 and all the default settings, which result in Spark 2.4.7. How do you set up Spark version? Do you use SPARK_CONF or SPARK_HOME environment variables for...

New version does not work as stated in docs

Can you post the output of this (in Julia console)? ``` ] st Spark ```

New version does not work as stated in docs

Let's go through your code piece by piece: ```julia using Pkg; Pkg.add("Spark") Pkg.add("CSV"); using CSV ``` I'm not sure why you refer to CSV here - it is a completely...