AutoTS
AutoTS copied to clipboard
Question: Multi-Objective Forecasting
Problem: A dataset with multiple data columns that may or may not be temporally coupled, and also multiple output sets. e.g. market sectors and wellbeing index as columns, see if one index is tied to the rest of the indices.
I don't entirely understand what you are asking for. Could you explain a bit more? If you are trying to figure out if multiple features are correlated (pandas correlation) or if there is casuality (Granger Casuality) you don't need AutoTS for that. You might want to look at Statsmodel, you can run something like a VAR or VECM and see the significance of coefficients.
@winedarksea what if you have multiple items to forecast based on a TimeSeries, and not just a single item? Other than correlation or causality, are there algorithms possible to do this, and thus AutoTS?
Yes, you can predict multiple items at the same time. Usually we use the word 'multivariate' forecasting or in the more general case 'multioutput' regressions. There are two basic ways of inputting the information:
- your input dataframe which contains all history for 1 to many input variables. You can adjust
weights
to show which series you care about forecasting (set 0 for series that are input only, with no desired forecast output). All example datasets are multivariate so you can follow those examples. - then there are
future_regressors
which are features you will know about in advance (for example, in sales forecasting, how many hours the store will be open on future days, how many employees are scheduled to work, etc) or a forward lagged version of other features.
You will run into challenges if your multiple items don't align very well. You might need to resample them to the same frequency. Not all models use the multivariate information (see the table at the end of the extended_tutorial.md).
Probably the simplest thing to do is to follow through and examine an example like production_example.py here and see how it handles the data.
You will run into challenges if your multiple items don't align very well.
Other than this problem for some data (e.g. matching weekly vs daily data, or those data with large gaps of data), it is really useful (e.g. stock data or live weather/traffic data).
I would assume that the example is this, right? https://github.com/winedarksea/AutoTS/blob/master/production_example.py
Further Q:
- How does training, testing, and validation works if the dataset is long, and it is possible to do either full-memory backtesting or sliding windows? Also how can the gap of testing and validation windows compare?
- Are there any other in-built API for
load_live_daily
for Google trends or Crypto? https://makersportal.com/blog/2020/1/19/google-trends-x-yahoo-finance
That production example is my approach to it, although not the only possible way to do things.
- Validation works on windows of length equal to the forecast length. How it takes those windows is based on the
validation_method
which has backwards (progressively backwards slices), similarity (fancier, windows chosen based on similarity of a distance metric), even (like slices of a pie) and seasonal (backwards with specified spacing). There's also an option to pass a list of custom indices to use for validation. Probably worth looking at themodel.validation_train_indexes
andmodel.validation_test_indexes
for your runs to see how it has made the splits. - pytrends is built in for Google Trends, and yfinance is there too (same Yahoo Finance data, but different package than you link). You will need to make sure to install those packages, then specify
trends_list
andtickers
respectively as arguments. Also install fredapi and pass a fred_key as those series are often helpful. Let me know if you find any other live free data sources that you would like to add, I would be happy to add them. Runninghelp(load_live_daily)
should print some of the args for that
Currently reading through https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html#models-1 and the warning in https://winedarksea.github.io/AutoTS/build/html/source/tutorial.html#model-lists-1 How do I exclude the two models considered "too slow"? Also is it possible to note on which ones are generally fast to do convergence or grokking?
It really depends, so it hard to give an exact answer. Some models are slow with lots of historical data, other models are slow with many multivariate series. It also depends on your computational resources. The models in 'parallel' scale linearly with number of CPU cores available, but will be slow if you have many series, but only a few cores. And a few, pytorch-forecasting and gluonts will be affected by available GPU resources (they run fine on CPU, gluonts is actually tfaster on a good CPU).
The 'superfast' model list is naive models, which with all the preprocessing can deliver pretty good models. I generally recommend testing with that. Try 'fast' next, and then 'fast_parallel'. And of course a custom list is always an option model_list = ['AverageValueNaive', 'SeasonalNaive', 'UnivariateMotif', 'ARCH'] etc, etc, etc