distributed-dataset icon indicating copy to clipboard operation
distributed-dataset copied to clipboard

YARN Backend

Open utdemir opened this issue 6 years ago • 1 comments

YARN is the most common way to schedule Spark & Hadoop on a cluster.

Supporting it as an executor will enable us to run side-by-side with existing data processing pipelines.

utdemir avatar Jun 25 '19 09:06 utdemir

I spent a bit of time experimenting today. My first idea was to use inline-java to directly interface with JVM. However it turns out it adds considerable complexity to the build process.

I've decided on a simpler approach of creating a wrapper Java application responsible for interfacing with YARN and communicating with the Haskell executable. Since we only need one type of message (spawn an executor and return the result) I believe the interface between Java and Haskell will be quite small. Initially, I will probably create a simple protocol using UNIX pipes.

utdemir avatar Jun 30 '19 00:06 utdemir