ChronixDB/chronix.spark: Time series analysis with Apache Spark based on Ch...

image:https://travis-ci.org/ChronixDB/chronix.spark.svg?branch=master["Build Status", link="https://travis-ci.org/ChronixDB/chronix.spark"] image:https://coveralls.io/repos/github/ChronixDB/chronix.spark/badge.svg?branch=master["Coverage",link="https://coveralls.io/github/ChronixDB/chronix.spark?branch=master"] image:http://img.shields.io/badge/license-ASF2-blue.svg["Apache License 2",link="https://github.com/ChronixDB/chronix.spark/blob/master/LICENSE")] image:https://sputnik.ci/conf/badge["Sputnik",link="https://sputnik.ci/app#/builds/ChronixDB/chronix.spark")] image:https://badge.waffle.io/ChronixDB/chronix.spark.png?label=ready&title=Ready["Stories in Ready",link="http://waffle.io/ChronixDB/chronix.spark")]

image::logo.png[] = Chronix Spark An Apache Spark RDD implementation for time series processing - based on Chronix.

== Usage Guide

link:chronix-infrastructure-local/README.adoc[Use Chronix Spark within 5mins (on Mac OSX or Linux)]

== Design Principles

A ChronixRDD is a collection of univariate time series. Each of them has its own vector of timestamps - they are not aligned on one common vector of timestamps.
Time series are multi-dimensional. Each time series is associated to one or more dimensions. The identity of a time series is the combination of some of its dimension values.
ChronixRDD has its own storage engine based on Solr Cloud and the Chronix format. So the time series data is stored storage-efficient, sharded and with equipped with low-level queries to perform predicate pushdown.

== FAQ

How does Chronix Spark compare to Spark-TS?

Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.
In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation.
Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.
Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn't support an IndexedRowMatrix for SparkML.
Chronix is purely written in Java. There is no explicit support for Python and Scala yet.
Chronix doesn not support a ZonedTime as this makes it way more complicated.

chronix.spark
chronix.spark copied to clipboard

Metadata

← Metadata

Owner

Metadata

chronix.spark chronix.spark copied to clipboard

Metadata

← Metadata

Owner

Metadata

chronix.spark
chronix.spark copied to clipboard