Andrei Zhabinski

Results 180 comments of Andrei Zhabinski

Although integrating Arrow into existing API may be easy, I believe we need to drop [RDD API](https://github.com/dfdx/Spark.jl/issues/86) and fully migrate to Dataset API first - otherwise we will need to...

Currently Julia and JVM communicate in 2 ways: * Julia starts a JVM and calls Java functions via JavaCall. In particular, Julia driver creates Spark application and delegates computations to...

More broadly, there are several ways to efficiently bring custom Julia functions to Spark clusters including things like compiling Julia to Java and creating a new distributed computation framework. But...

Please see my [comment](https://github.com/dfdx/Spark.jl/issues/106#issuecomment-1193183401) in the other issue if you haven't done it yet. A few more suggestions for further reports: 1. Use triple quotes (```) to format code. See...

A quick update on UDFs. Although a UDF in Spark can be implemented a single class which evaluates its inputs, for UDFs that call external processes a more sophisticated handling...

@Drvi Note that the current version of Spark.jl already supports the most popular DataFrame functions, including `select`, `group_by`, `join`, etc. Other functions are usually easy to add by request. The...

The most likely case is using Spark 3, which we didn't test against yet. Can you try it with Spark 2.4?

I'm using Julia 1.5 and all the default settings, which result in Spark 2.4.7. How do you set up Spark version? Do you use SPARK_CONF or SPARK_HOME environment variables for...

Can you post the output of this (in Julia console)? ``` ] st Spark ```

Let's go through your code piece by piece: ```julia using Pkg; Pkg.add("Spark") Pkg.add("CSV"); using CSV ``` I'm not sure why you refer to CSV here - it is a completely...