distributed-dataset
                                
                                
                                
                                    distributed-dataset copied to clipboard
                            
                            
                            
                        YARN Backend
YARN is the most common way to schedule Spark & Hadoop on a cluster.
Supporting it as an executor will enable us to run side-by-side with existing data processing pipelines.
I spent a bit of time experimenting today. My first idea was to use inline-java to directly interface with JVM. However it turns out it adds considerable complexity to the build process.
I've decided on a simpler approach of creating a wrapper Java application responsible for interfacing with YARN and communicating with the Haskell executable. Since we only need one type of message (spawn an executor and return the result) I believe the interface between Java and Haskell will be quite small. Initially, I will probably create a simple protocol using UNIX pipes.