spark-timeseries
spark-timeseries copied to clipboard
ARIMA Documentation
Hi Guys,
can someone provide an Spark-ts example with ARIMA model for ML. ? Would be nice
Hi @jomach, are there any particular tasks you're interested in using the ARIMA model for? I.e. are you looking for an example that simply fits an ARIMA model to the data? Or you'd like to use it for forecasting?
Hi Cloudera, For Example one ML Model that allows us to make forecasting with ARIMA. For Example with the Data : 03.01.15;22:30;236,25 03.01.15;22:15;240 04.01.15;16:00;243,775
create a Arima model and predict on test data. It would ok already. This will help people starting.
Or updating http://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/ http://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/ …
Thanks for the quick reply !
On 09/02/2016, at 10:12, Sandy Ryza [email protected] wrote:
Hi @jomach https://github.com/jomach, are there any particular tasks you're interested in using the ARIMA model for? I.e. are you looking for an example that simply fits an ARIMA model to the data? Or you'd like to use it for forecasting?
— Reply to this email directly or view it on GitHub https://github.com/cloudera/spark-timeseries/issues/124#issuecomment-181797118.
I wrote up a super basic ARIMA example over here: https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala
This is using the ARIMA implementation in a single-node context, because there's nothing inherently distributed about it, but you can also easily stick it in a map function to apply it to every series in a TimeSeriesRDD.
Hi sryra, sorry for the late reply. Thanks for the ARIMA example. I tryed with your test data now I will try with my data. There are some Bugs in the Example. Error :
From my console :
scala> val arimaModel = ARIMA.fitModel(1, 0, 1, ts)
Fix :
import org.apache.spark.mllib.linalg.Vectors val ts = Vectors.dense(lines.map(_.toDouble).toArray) ARIMA model wants a mlib vector and not a breeze
and another thing: The stocks example does not work on my console (I didn´t debug it) but : scala> ZonedDateTime.parse("2015-09-22") java.time.format.DateTimeParseException: Text '2015-09-22' could not be parsed at index 10
The issue you're hitting with ARIMA is due to the fact that, in the master branch of spark-ts, I've recently switched the statistical functionality from Breeze to MLlib. The examples sit on the 0.2.0 release, which still uses Breeze for ARIMA. When I publish an 0.3.0 release, I'll update the examples to use MLlib as well.
Thanks for catching that issue with the Stocks example. I've pushed a fix to the repo. Let me know if there's still an issue?
@sryza could you please help me with the same issue. I'm trying to fit ARIMA model to my dataset
DenseVector dv = new DenseVector(total); ARIMAModel fitModel = ARIMA.fitModel(1, 0, 1, (Vector<Object>) dv, true, "css-cgd", null);
the code is in java, according to your code a dense vector should be provided but the signature of fitmodel ask for a vector object. I tried to cast it but now i get a following error
Exception in thread "main" java.lang.ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be cast to breeze.linalg.Vector
Could you please tell me how to solve it.
@anshulemc :
import org.apache.spark.mllib.linalg.Vectors val ts = Vectors.dense(lines.map(_.toDouble).toArray)
Hi @jomach
Yeah i have understood the same but its written in scala and i'm trying to go ahead with java. As our previous code also works on java. It will be great if you can tell me a solution for the above problem in java
I´m no Java expert but: Have you try not to Cast it but to create the dv with the mllib DenseVector. From your code I read that you are already creating a DenseVector, Have you try to remove the cast and just pass the dv into fitModel?
@jomach I tried going ahead ahead with the dense vector only and not casting it and i get the below error
The method fitModel(int, int, int, Vector<Object>, boolean, String, double[]) in the type ARIMA is not applicable for the arguments (int, int, int, DenseVector, boolean, String, null)
I pushed out an 0.3.0 release of the library today, and updated the examples. In 0.3.0, only MLlib vectors are used in public APIs, which should make things simpler. Here's the link again in case you need it: https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala. Let me know if that works?
How do I save ARIMAModel in hdfs for future reference (for forecasting)? I have been trying to save it and figuring out the way in Cloudera API, but failed. Since ArimaModel is not serialized, I can't save it using java api. ObjectOutputStream modelsave = new ObjectOutputStream(new FileOutputStream("<PATH>")); modelsave.writeObject(model);
Is there any way to save model on hdfs? Please suggest.
I guess you can simply save the parameters of the model instead?
@sryza Hey.. Can you write me a java program that fits an ARIMA model.. I have a CSV of the form: time | count 1492001401000 | 29 1492001402000 | 43 1492001403000 | 22 etc.
Thanks.
Can someone help me fix this problem please. I am a learner. I am running the simplest ARIMA model of Sryza https://github.com/sryza/spark-ts-examples/blob/master/jvm/src/main/scala/com/cloudera/tsexamples/SingleSeriesARIMA.scala
BELOW IS THE ERROR MESSAGE.
Exception in thread "main" java.lang.NoClassDefFoundError: com/cloudera/sparkts/models/ARIMA$ at com.cloudera.tsexamples.SingleSeriesARIMA$.main(SingleSeriesARIMA.scala:41) at com.cloudera.tsexamples.SingleSeriesARIMA.main(SingleSeriesARIMA.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.cloudera.sparkts.models.ARIMA$ at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)