emma
emma copied to clipboard
A quotation-based Scala DSL for scalable data analysis.
Emma
A quotation-based Scala DSL for scalable data analysis.
Goals
Our goal is to improve developer productivity by hiding parallelism aspects behind a high-level, declarative API which maximises reuse of native Scala syntax and constructs.
Emma supports state-of-the-art dataflow engines such as Apache Flink and Apache Spark as backend co-processors.
Features
DSLs for scalable data analysis are embedded through types. In contrast, Emma is based on quotations (similar to Quill). This approach has two benefits.
First, it allows to reuse Scala-native, declarative constructs in the DSL.
Quoted Scala syntax such as
for
-comprehensions,
case-classes, and
pattern matching
are thereby lifted to an intermediate representation called Emma Core.
Second, it allows to analyze and optimize Emma Core terms holistically.
Subterms of type DataBag[A]
are thereby transformed and off-loaded to a parallel dataflow engine such as Apache Flink or Apache Spark.
Examples
The emma-examples module contains examples from various fields.
- Graph Analysis
- Connected Components
- Triangle Enumeration
- Transitive Closure
- Supervised Learning
- Naive Bayses Classification
- Unsupervised Learning
- k-Means Clustering
- Text Processing
- Word Count
Learn More
Check emma-language.org for further information.
Build
- JDK 7+ (preferably JDK 8)
- Maven 3
Run
mvn clean package -DskipTests
to build Emma without running any tests.
For more advanced build options including integration tests for the target runtimes please see the "Building Emma" section in the Wiki.