pyFTS
pyFTS copied to clipboard
how to use hyperparams
I am struggle to find guidance about how to use hyperparam modul such as grid search or evolutionary. anyone can share ?
thank you
Hi @ramdhan1989
Thanks for your interest in our tool, and forgive-me for the long delay.
First of all, before hyperparameter optimization (hereafter called hyperopt), you should perform the time series analysis (such as ACF/PACF plots, tests of stationarity and scedasticity, etc). Hyperopt does not unuseful to know how your time-series data behaves.
The hyperparameter optimization of FTS is described here, and is called DEHO - Distributed Evolutionary Hyperparameter Optimization, but there are other methods then evolutionary in the library. The return of the method will be a dictionary with the best parameters found for forecasting the dataset using the selected FTS method (in the parameter fts_method).
Below a list of the implemented methods:
- Grid Search (GS) is very accurate but also very computationally expensive.
from pyFTS.hyperparam import GridSearch
from pyFTS.models import hofts
from pyFTS.data import TAIEX
datasetname = 'TAIEX'
dataset = TAIEX.get_data()
#The list of hyperparameters search spaces
hyperparams = {
'order': [1, 2, 3],
'partitions': np.arange(10,100,3),
'partitioner': [1,2], #GridSearch, EntropySearch, ...
'mf': [1, 2, 3], #Triangular, Trapezoidal and Gaussian
'lags': np.arange(2, 7, 1), # The lag indexes
'alpha': np.arange(.0, .5, .05) #Alpha Cut
}
GridSearch.execute(
hyperparams, #A dictionary containing the search spaces for each hyperparameter
datsetname, #Just the name of your dataset
dataset, #Your time series data (list or np.ndarray 1d)
fts_method=hofts.WeightedHighOrderFTS, # the FTS method you want to optimize [only univariate methods]
window_size=10000, #The length of the data window for the Sliding Window Cross Validation method
train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test
increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method
database_file='hyperopt.db' #A sqlite database that will contain the log of the hyperopt process
)
There is no GridSearch implementation yet for multivariate methods.
- Random Search (RS) is computationally cheap but may not correctly converge, so it is not the more accurate method. Currently RS is implemented only for MVFTS.
from pyFTS.hyperparam import mvfts as deho_mv
from pyFTS.models.multivariate import mvfts, wmvfts
from pyFTS.models.seasonal.common import DateTime
from pyFTS.data import Malaysia
dataset = Malaysia.get_dataframe()
dataset['time'] = pd.to_datetime(data["time"], format='%m/%d/%y %I:%M %p')
explanatory_variables =[
{'name': 'Temperature', 'data_label': 'temperature', 'type': 'common'},
{'name': 'Daily', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.minute_of_day, 'npart': 24 },
{'name': 'Weekly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_week, 'npart': 7 },
{'name': 'Monthly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_month, 'npart': 4 },
{'name': 'Yearly', 'data_label': 'time', 'type': 'seasonal', 'seasonality': DateTime.day_of_year, 'npart': 12 }
]
target_variable = {'name': 'Load', 'data_label': 'load', 'type': 'common'}
deho_mv.random_search(
datsetname, #Just the name of your dataset
dataset, #Your time series data (pd.DataFrame)
npop=200, #Size of population of the RS
mgen=70, #Number of iterations of the RS
fts_method=wmvfts.WeightedMVFTS, #The multivariate FTS method to optimize
variables=explanatory_variables, #The list of exogenous/explanatory variables
target_variable=target_variable, #The endogenous/target variable
window_size=10000, #The length of the data window for the Sliding Window Cross Validation method
train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test
increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method
)
- Genetic Algorithm (GA) is between GS and RS, both in accuracy and computational cost.
from pyFTS.hyperparam import Evolutionary
from pyFTS.models import hofts
from pyFTS.data import TAIEX
datasetname = 'TAIEX'
dataset = TAIEX.get_data()
ret = Evolutionary.execute(
datsetname, #Just the name of your dataset
dataset, #Your time series data (list or np.ndarray 1d)
fts_method=hofts.WeightedHighOrderFTS, # the FTS method you want to optimize [only univariate methods]
ngen=30, #Number of generations, the number of iterations of the GA
npop=20, #The size of population of the GA
psel=0.6, #Probability of selection of the GA
pcross=.5, #Probability of crossover of the GA
pmut=.3, #Probability of mutation of the GA
window_size=10000, #The length of the data window for the Sliding Window Cross Validation method
train_rate=.9, #The proportion of the data window that will be used for training, the remaining will be used for test
increment_rate=.3, #The sliding increment the Sliding Window Cross Validation method
experiments=1, #Number of hyperopt experiments to perform
database_file='hyperopt.db' #A sqlite database that will contain the log of the hyperopt process
)
Please, do not hesitate to get in touch if you have any questions.
Best regards
Thanks, all those three method work !
after executing hyperparameter optimization, does the model fitted automatically using the best params ? or we need to take the value from the output dict and fit the model ?
Would you mind elaborating more about the dict ? I am confused the values belong to which parameter ? from your code using GA :
Experiment 0 Evaluating initial population 1600098526.9596627 GENERATION 0 1600098526.9596627 WITHOUT IMPROVEMENT 1 GENERATION 1 1600098526.9606583 WITHOUT IMPROVEMENT 2 GENERATION 2 1600098526.9626496 WITHOUT IMPROVEMENT 3 GENERATION 3 1600098526.963645 WITHOUT IMPROVEMENT 4 GENERATION 4 1600098526.9656367 WITHOUT IMPROVEMENT 5 GENERATION 5 1600098526.9666321 WITHOUT IMPROVEMENT 6 GENERATION 6 1600098526.9686234 WITHOUT IMPROVEMENT 7 ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'rmse', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'size', inf) ('TAIEX', 'Evolutive', 'hofts', None, 1, 3, 2, 40, 0.5, '[2, 6, 7]', 'time', 0.010952949523925781)
below is the return dict :
{'alpha': 0.5, 'f1': inf, 'f2': inf, 'lags': [2, 6, 7], 'mf': 1, 'npart': 40, 'order': 3, 'partitioner': 2, 'rmse': inf, 'size': inf, 'time': 0.010952949523925781}
Hi @ramdhan1989
Using this dictionary you can build a model with this code:
from pyFTS.hyperparam import Evolutionary
model = Evolutionary.phenotype(
dictionary, #the result of the hyperparameter method
train, #The train dataset
fts_method #the FTS method
)
Best regards
well thanks a lot @petroniocandido . does the hyperparams optimization search the best data transformation as well ? such as how many lags for differential ? or may be what kind of transformations is the best for the problem ?
thank you
Hi @petroniocandido , how can I get stable prediction using GA ? every time I run it will result different values. do you have suggestion ?
Hi @petroniocandido , I come back to try using this package. Just want to clarify several things :
-
how to use Transformation differential into hyperparam optimization ?
-
using evolutionary, I got rmse "nan". is it good ?
-
is it possible to use other eval metric ? such as rmsle (root mean sq log error) ?
appreciate for your answers
thank you