OSPBench: Open Stream Processing Benchmark

This repository contains the code of the open stream processing benchmark.

All documentation can be found in our wiki.

It includes:

benchmark: benchmark pipeline implementations (docs).
data-stream-generator: data stream generator to generate input streams locally or on a DC/OS cluster (docs).
output-consumer: consumes the output of the processing job and metrics-exporter from Kafka and stores it on S3 (docs).
evaluator: computes performance metrics on the output of the output consumer (docs).
result analysis: Jupyter notebooks to visualize the results (docs).
deployment: deployment scripts to run the benchmark on an DC/OS setup on AWS (docs).
kafka-cluster-tools: Kafka scripts to start a cluster and read from a topic for local development (docs).
metrics-exporter: exports metrics of JMX and cAdvisor and writes them to Kafka (docs).

Currently the benchmark includes Apache Spark (Spark Streaming and Structured Streaming), Apache Flink and Kafka Streams.

References, Publications and Talks

Are you having issues with anything related to the project? Do you wish to use this project or extend it? The fastest way to contact me is through: