spark-timeseries ARIMA Documentation

Hi Guys,

can someone provide an Spark-ts example with ARIMA model for ML. ? Would be nice

Feb 09 '16 09:02 jomach

Hi @jomach, are there any particular tasks you're interested in using the ARIMA model for? I.e. are you looking for an example that simply fits an ARIMA model to the data? Or you'd like to use it for forecasting?

Feb 09 '16 10:02 sryza

Hi Cloudera, For Example one ML Model that allows us to make forecasting with ARIMA. For Example with the Data : 03.01.15;22:30;236,25 03.01.15;22:15;240 04.01.15;16:00;243,775

create a Arima model and predict on test data. It would ok already. This will help people starting.

Or updating http://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/ http://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/ …

Thanks for the quick reply !

On 09/02/2016, at 10:12, Sandy Ryza [email protected] wrote:

Hi @jomach https://github.com/jomach, are there any particular tasks you're interested in using the ARIMA model for? I.e. are you looking for an example that simply fits an ARIMA model to the data? Or you'd like to use it for forecasting?

— Reply to this email directly or view it on GitHub https://github.com/cloudera/spark-timeseries/issues/124#issuecomment-181797118.

Feb 09 '16 10:02 jomach

I wrote up a super basic ARIMA example over here: https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala

This is using the ARIMA implementation in a single-node context, because there's nothing inherently distributed about it, but you can also easily stick it in a map function to apply it to every series in a TimeSeriesRDD.

Feb 10 '16 04:02 sryza

Hi sryra, sorry for the late reply. Thanks for the ARIMA example. I tryed with your test data now I will try with my data. There are some Bugs in the Example. Error :

From my console :

scala> val arimaModel = ARIMA.fitModel(1, 0, 1, ts) :31: error: type mismatch; found : breeze.linalg.DenseVector[Double] required: org.apache.spark.mllib.linalg.Vector val arimaModel = ARIMA.fitModel(1, 0, 1, ts)

Fix :

import org.apache.spark.mllib.linalg.Vectors val ts = Vectors.dense(lines.map(_.toDouble).toArray) ARIMA model wants a mlib vector and not a breeze

Feb 14 '16 11:02 jomach

and another thing: The stocks example does not work on my console (I didn´t debug it) but : scala> ZonedDateTime.parse("2015-09-22") java.time.format.DateTimeParseException: Text '2015-09-22' could not be parsed at index 10

Feb 14 '16 11:02 jomach

The issue you're hitting with ARIMA is due to the fact that, in the master branch of spark-ts, I've recently switched the statistical functionality from Breeze to MLlib. The examples sit on the 0.2.0 release, which still uses Breeze for ARIMA. When I publish an 0.3.0 release, I'll update the examples to use MLlib as well.

Thanks for catching that issue with the Stocks example. I've pushed a fix to the repo. Let me know if there's still an issue?

Feb 16 '16 20:02 sryza

@sryza could you please help me with the same issue. I'm trying to fit ARIMA model to my dataset

DenseVector dv = new DenseVector(total); ARIMAModel fitModel = ARIMA.fitModel(1, 0, 1, (Vector<Object>) dv, true, "css-cgd", null);

the code is in java, according to your code a dense vector should be provided but the signature of fitmodel ask for a vector object. I tried to cast it but now i get a following error

Exception in thread "main" java.lang.ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be cast to breeze.linalg.Vector

Could you please tell me how to solve it.

Feb 17 '16 09:02 anshulemc

@anshulemc :

import org.apache.spark.mllib.linalg.Vectors val ts = Vectors.dense(lines.map(_.toDouble).toArray)

Feb 17 '16 10:02 jomach

Hi @jomach

Yeah i have understood the same but its written in scala and i'm trying to go ahead with java. As our previous code also works on java. It will be great if you can tell me a solution for the above problem in java

Feb 17 '16 10:02 anshulemc

I´m no Java expert but: Have you try not to Cast it but to create the dv with the mllib DenseVector. From your code I read that you are already creating a DenseVector, Have you try to remove the cast and just pass the dv into fitModel?

Feb 17 '16 10:02 jomach

@jomach I tried going ahead ahead with the dense vector only and not casting it and i get the below error

The method fitModel(int, int, int, Vector<Object>, boolean, String, double[]) in the type ARIMA is not applicable for the arguments (int, int, int, DenseVector, boolean, String, null)

Feb 17 '16 10:02 anshulemc

I pushed out an 0.3.0 release of the library today, and updated the examples. In 0.3.0, only MLlib vectors are used in public APIs, which should make things simpler. Here's the link again in case you need it: https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala. Let me know if that works?

Feb 19 '16 07:02 sryza

How do I save ARIMAModel in hdfs for future reference (for forecasting)? I have been trying to save it and figuring out the way in Cloudera API, but failed. Since ArimaModel is not serialized, I can't save it using java api. ObjectOutputStream modelsave = new ObjectOutputStream(new FileOutputStream("<PATH>")); modelsave.writeObject(model);

Is there any way to save model on hdfs? Please suggest.

Sep 20 '16 11:09 nikitaGoyal2

I guess you can simply save the parameters of the model instead?

Dec 02 '16 16:12 yhzhao

@sryza Hey.. Can you write me a java program that fits an ARIMA model.. I have a CSV of the form: time | count 1492001401000 | 29 1492001402000 | 43 1492001403000 | 22 etc.

Thanks.

Apr 12 '17 14:04 moulidn

Can someone help me fix this problem please. I am a learner. I am running the simplest ARIMA model of Sryza https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala

BELOW IS THE ERROR MESSAGE.

Exception in thread "main" java.lang.NoClassDefFoundError: com/cloudera/sparkts/models/ARIMA$ at com.cloudera.tsexamples.SingleSeriesARIMA$.main(SingleSeriesARIMA.scala:41) at com.cloudera.tsexamples.SingleSeriesARIMA.main(SingleSeriesARIMA.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.cloudera.sparkts.models.ARIMA$ at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

May 17 '19 14:05 abafo22

spark-timeseries spark-timeseries copied to clipboard

ARIMA Documentation

spark-timeseries
spark-timeseries copied to clipboard