chronix.spark
chronix.spark copied to clipboard
Time series analysis with Apache Spark based on Chronix |
image:https://travis-ci.org/ChronixDB/chronix.spark.svg?branch=master["Build Status", link="https://travis-ci.org/ChronixDB/chronix.spark"] image:https://coveralls.io/repos/github/ChronixDB/chronix.spark/badge.svg?branch=master["Coverage",link="https://coveralls.io/github/ChronixDB/chronix.spark?branch=master"] image:http://img.shields.io/badge/license-ASF2-blue.svg["Apache License 2",link="https://github.com/ChronixDB/chronix.spark/blob/master/LICENSE")] image:https://sputnik.ci/conf/badge["Sputnik",link="https://sputnik.ci/app#/builds/ChronixDB/chronix.spark")] image:https://badge.waffle.io/ChronixDB/chronix.spark.png?label=ready&title=Ready["Stories in Ready",link="http://waffle.io/ChronixDB/chronix.spark")]
image::logo.png[] = Chronix Spark An Apache Spark RDD implementation for time series processing - based on Chronix.
== Usage Guide
- link:chronix-infrastructure-local/README.adoc[Use Chronix Spark within 5mins (on Mac OSX or Linux)]
== Design Principles
- A
ChronixRDD
is a collection of univariate time series. Each of them has its own vector of timestamps - they are not aligned on one common vector of timestamps. - Time series are multi-dimensional. Each time series is associated to one or more dimensions. The identity of a time series is the combination of some of its dimension values.
-
ChronixRDD
has its own storage engine based on Solr Cloud and the Chronix format. So the time series data is stored storage-efficient, sharded and with equipped with low-level queries to perform predicate pushdown.
== FAQ
How does Chronix Spark compare to Spark-TS?
-
Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.
-
In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation.
-
Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.
-
Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn't support an
IndexedRowMatrix
for SparkML. -
Chronix is purely written in Java. There is no explicit support for Python and Scala yet.
-
Chronix doesn not support a ZonedTime as this makes it way more complicated.