chronix.spark icon indicating copy to clipboard operation
chronix.spark copied to clipboard

Time series analysis with Apache Spark based on Chronix |

image:https://travis-ci.org/ChronixDB/chronix.spark.svg?branch=master["Build Status", link="https://travis-ci.org/ChronixDB/chronix.spark"] image:https://coveralls.io/repos/github/ChronixDB/chronix.spark/badge.svg?branch=master["Coverage",link="https://coveralls.io/github/ChronixDB/chronix.spark?branch=master"] image:http://img.shields.io/badge/license-ASF2-blue.svg["Apache License 2",link="https://github.com/ChronixDB/chronix.spark/blob/master/LICENSE")] image:https://sputnik.ci/conf/badge["Sputnik",link="https://sputnik.ci/app#/builds/ChronixDB/chronix.spark")] image:https://badge.waffle.io/ChronixDB/chronix.spark.png?label=ready&title=Ready["Stories in Ready",link="http://waffle.io/ChronixDB/chronix.spark")]

image::logo.png[] = Chronix Spark An Apache Spark RDD implementation for time series processing - based on Chronix.

== Usage Guide

  • link:chronix-infrastructure-local/README.adoc[Use Chronix Spark within 5mins (on Mac OSX or Linux)]

== Design Principles

  • A ChronixRDD is a collection of univariate time series. Each of them has its own vector of timestamps - they are not aligned on one common vector of timestamps.
  • Time series are multi-dimensional. Each time series is associated to one or more dimensions. The identity of a time series is the combination of some of its dimension values.
  • ChronixRDD has its own storage engine based on Solr Cloud and the Chronix format. So the time series data is stored storage-efficient, sharded and with equipped with low-level queries to perform predicate pushdown.

== FAQ

How does Chronix Spark compare to Spark-TS?

  • Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.

  • In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation.

  • Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.

  • Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn't support an IndexedRowMatrix for SparkML.

  • Chronix is purely written in Java. There is no explicit support for Python and Scala yet.

  • Chronix doesn not support a ZonedTime as this makes it way more complicated.