ML
ML copied to clipboard
Native Time Series and Forecasting Support (Sequence Learning)
Time series analysis is a popular machine learning technique for forecasting trends of time-dependent variables such as stock price, GDP, and quarterly sales. Given the popularity (https://github.com/RubixML/RubixML/issues/35, https://github.com/RubixML/RubixML/issues/38, https://github.com/RubixML/RubixML/issues/40) and current lack of tooling within the PHP ecosystem, I propose adding native time series support as well as a new type of estimator class for forecasting time series datasets. This includes the following ...
- A datastructure extending Dataset for time series datasets that includes an additional index for timestamps
- An additional estimator type "Forecaster" to predict the next k values in a series
There should be no need to modify any of the public interfaces to integrate these features into the current architecture
Proposed initial Forecaster implementations:
- ARIMA - AutoRegressive Integrated Moving Average (univariate)
- VARMAX - Vector AutoRegressive Moving Average with eXogenous regressors (multivariate)
Open to comments
Yes, I would very like those additions to the library. Thank you!
Thanks for the input @BasvanH
Expanding on the aforementioned design outline ...
The TimeSeries dataset object will have additional sorting, filtering, etc. methods that operate on the timestamp column. These will be similar to how Labeled provides additional methods that operate on labels. The timestamp column will allow either homogeneous integer or DateTime object elements.
Since time series estimation often diverges when considering univariate vs the multivarate case, the TimeSeries dataset object will handle both cases simultaneously, simply by keeping track of the number of target variables (as already accomplished using the numColumns()
method on the Dataset class). For example, a univariate TimeSeries dataset object has a single column, whereas a multivariate one has more than 1 column. It will be the responsibility of the estimator to check whether the incoming dataset is compatible.
As mentioned previously, the public Estimator API will not change with the introduction of the new estimator type. In the case of forecasters the output of the predict()
method will be the estimation of the next value given the last value in a series. The interpretation of the dataset therefore is slightly different at inference than during training in which the dataset is interpreted as a both contiguous and atomic. During inference, each sample will be considered independently and the value will be interpreted as either the empirical or theoretical last value of a time series the user would like to start inferring from. Since forecasters are estimators at heart, they benefit from all the additional tooling such as meta-Estimators and the cross validation framework.
In addition, we will add the Forecaster interface allowing estimators to implement the forecast()
method which, unlike predict()
will estimate the next k values starting at a given offset. It is assumed that most forecaster types will implement the Forecaster interface as prediction (as defined above) is only a special case of forecasting where k=1. There are currently two prototypes for the forecast()
method signature to consider. The first is borrowing the idea of start
and end
from the statsmodels library (see their predict API). The second idea is to use the timestamp of the TimeSeries dataset object as the start
and then output the next k subsequent values. The differences look like this ...
public forecast(TimeSeries $dataset, $start, $end) : array
vs.
public forecast(TimeSeries $dataset, int $k) : array
So far I personally prefer the latter case
As with the Learner, Probabalistic, and Ranking interfaces, the Forecaster interface will also include the forecastSample()
method to handle inference on single samples at a time.
Open to comments
Update:
Since we are in a feature-freeze for the time being, this enhancement will be moved over to the Extras package for the time being and may be integrated into the main package after
Hi! sorry for commenting on a closed issue.
The comment said that its moved to the Extras package, understandably, however is it that the idea will be moved there or is it already there?
Regardless I much appreciate all the hard work been put into RubixML, just curious. 😄
Hello, I would also like to know the status here. I would like to test forecasting for an idea on my side
thank you
Hello @LasseRafn and @Rello thanks for commenting, I'll give an update and we'll reopen this issue to keep the discussion going.
We haven't got around to implementing time-series in ML or Extras yet, although we have plenty of research planned in regards to sequence learning, we have no immediate plans to implement features at this time. Having that said, we're seeing an uptick in contributions, it's possible that someone from the community can take on this effort.
Could simpler sequence implementation be faster to implement first?
For example, dataset:
[0,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,0,0,0,1,1,1,1,1,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,1,1,1,1,1,1,1,1,0,1,0,0,0,0]
I see in this data, that 1 is more likely to be followed by 1, and 0 is more likely to be followed by 0. The more 1 or 0 are in a row, the more likely next value to be the same. Maybe there are other patters too. If human can see this pattern, maybe ML could too (and state the confidence).
Hi guys! Any news about this feature?
Thank you!
Hi @itrack. There's still talk about implementing VAR (vector autoregression) and LSTM. Nothing material has come about yet though. It's not that there's not enough want for sequence learning but that we really don't have the resources right now. Hopefully, we can attract more interest from the community.
Are there any new developments here in the meantime. I would also be interested in a time series forecast.