spark-timeseries icon indicating copy to clipboard operation
spark-timeseries copied to clipboard

Regression with Auto Regressive Residuals

Open mbaddar2 opened this issue 9 years ago • 7 comments

Based on https://www.otexts.org/fpp/9/1 in this issue we will implement the model

Yt = A+Bi*Xi,t +nt where nt (residuals) are assumed to be auto regressive process of a given order q AR(q) the steps are 1-estimate OLS regression model for given regressors Xt 2-Estimate parameters for AR(q) model , then update model coefficients in 1 3-Iterate between 1 and 2 till convergence.

@sryza comments ?

mbaddar2 avatar Dec 22 '15 07:12 mbaddar2

@mbaddar2 sorry for the delay here, but this looks like a good strategy to me.

sryza avatar Jan 06 '16 18:01 sryza

A note about durbin watson test for implementing Cochrane Orchutt The current implementation com.cloudera.sparkts.stats.TimeSeriesStatisticalTests#dwtest , just returns the value of the statistic without computing the critical values d_L_alpha and d_U_alpha , as mentioned in

https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic

to calculate the critical value we have two options

1)Precomputed values table , can be taken from https://www3.nd.edu/~wevans1/econ30331/Durbin_Watson_tables.pdf 2)Compute the p-value for dw-test in algorithmic way as mention in the dwtest function in R lmtest package http://www.inside-r.org/packages/cran/lmtest/docs/dwtest in details section

it think an alternative implementation for dwtest will need another issue to keep things simple , i will use the current implementation with the heuristic dw -> 0 , +ve correlation dw ->4 , -ve correlation dw -> 2, no correlation

@sryza , comments ?

mbaddar1 avatar Jan 16 '16 14:01 mbaddar1

I agree that it would definitely be useful to report p values for Durbin Watson.

Regarding the options, my preference would be to compute the dwtest in the algorithm way, given that the tables of precomputed values are huge. If this is too difficult though, I could be open to using the tables.

sryza avatar Jan 21 '16 07:01 sryza

@sryza check #117

mbaddar1 avatar Jan 24 '16 14:01 mbaddar1

@sryza I will be working now on extending the implemented cochrane orcutt #117 to the ARMA(p,q) case. I will start trying the method outlined in brockwell book (http://www.springer.com/gb/book/9780387953519) ,Ch 6 section 6.

Note that i will start with the case where p , q are known. We can start with this case then we can automate p , q estimation. Any further suggestion ?

mbaddar1 avatar Feb 28 '16 12:02 mbaddar1

Awesome. How does the method outlined in Brockwell compare with the current ARIMA implementation in https://github.com/sryza/spark-timeseries/blob/master/src/main/scala/com/cloudera/sparkts/models/ARIMA.scala? I'm not super familiar with the best way to fit ARIMA + regression models. I'm guessing it's not as easy as just training the the regression and ARIMA parts separately? Does MLE with conjugate gradient descent or BOBYQA work?

sryza avatar Feb 29 '16 08:02 sryza

I will do more reading in both methods to get better understanding , the will detail the differences to discuss.

mbaddar1 avatar Feb 29 '16 10:02 mbaddar1