neuralforecast
neuralforecast copied to clipboard
[Model] Auto versions of the models are not working in databricks
What happened + What you expected to happen
- In databricks Auto versions of the model seem not to be working. Might be related with this issue: ray tune not working inside databricks
- Some models were working okay (TimesNet), however Auto models were getting stuck.
Versions / Dependencies
1.6.4
Reproduction script
Issue Severity
None
Hi @iamyihwa! Have you tried using the optuna backend for Auto models?
Hi @cchallu Thanks for the idea! Just tried it, however fails with an error.
NHITS is not attached to a Trainer
.
from neuralforecast.losses.pytorch import RMSE
from neuralforecast import NeuralForecast
from neuralforecast.auto import TimesNet, AutoNHITS, AutoLSTM, AutoRNN
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST
horizon = 52
nf = NeuralForecast(
models= [
AutoNHITS( h= horizon, config = None, loss=RMSE(), backend='optuna'),
TimesNet (input_size = 2*horizon, # 4*time_len,
top_k = 3, # Number of periods (for FFT).
num_kernels = 3, # Number of kernels for Inception module
batch_size = 2, # Number of time series per batch
windows_batch_size = 32, # Number of windows per batch
learning_rate = 0.001,
h = horizon,
loss=RMSE(),
start_padding_enabled = True,
scaler_type = 'robust'
# futr_exog_list = ['Holiday_Flag', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment'], # Future exogenous variables ) # Horizon
)
],
freq='W-SAT'
)
nf.fit(train_df)
Error message:
W 2024-01-10 17:05:45,289] Trial 0 failed with parameters: {'n_pool_kernel_size': [8, 4, 1], 'n_freq_downsample': [1, 1, 1], 'learning_rate': 0.0017397234192811407, 'scaler_type': 'robust', 'max_steps': 700.0, 'batch_size': 128, 'windows_batch_size': 256, 'random_seed': 17, 'input_size': 104, 'step_size': 52} because of the following error: RuntimeError('NHITS is not attached to a `Trainer`.').
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py", line 319, in objective
return fitted_model.trainer.callback_metrics["valid_loss"].item()
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 209, in trainer
raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
RuntimeError: NHITS is not attached to a `Trainer`.
[W 2024-01-10 17:05:45,290] Trial 0 failed with value None.
RuntimeError: NHITS is not attached to a `Trainer`.
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
File <command-2274526657341609>, line 1
----> 1 nf.fit(train_df)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/core.py:274, in NeuralForecast.fit(self, df, static_df, val_size, sort_df, use_init_models, verbose)
271 print("WARNING: Deleting previously fitted models.")
273 for model in self.models:
--> 274 model.fit(self.dataset, val_size=val_size)
276 self._fitted = True
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:375, in BaseAuto.fit(self, dataset, val_size, test_size, random_seed)
373 best_config = results.get_best_result().config
374 else:
--> 375 results = self._optuna_tune_model(
376 cls_model=self.cls_model,
377 dataset=dataset,
378 val_size=val_size,
379 test_size=test_size,
380 verbose=self.verbose,
381 num_samples=self.num_samples,
382 search_alg=search_alg,
383 config=self.config,
384 )
385 best_config = results.best_trial.user_attrs["ALL_PARAMS"]
386 self.model = self._fit_model(
387 cls_model=self.cls_model,
388 config=best_config,
(...)
391 test_size=test_size,
392 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:327, in BaseAuto._optuna_tune_model(self, cls_model, dataset, val_size, test_size, verbose, num_samples, search_alg, config)
324 sampler = None
326 study = optuna.create_study(sampler=sampler, direction="minimize")
--> 327 study.optimize(
328 objective,
329 n_trials=num_samples,
330 show_progress_bar=verbose,
331 )
332 return study
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/study.py:451, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
348 def optimize(
349 self,
350 func: ObjectiveFuncType,
(...)
357 show_progress_bar: bool = False,
358 ) -> None:
359 """Optimize an objective function.
360
361 Optimization is done by choosing a suitable set of hyperparameter values from a given
(...)
449 If nested invocation of this method occurs.
450 """
--> 451 _optimize(
452 study=self,
453 func=func,
454 n_trials=n_trials,
455 timeout=timeout,
456 n_jobs=n_jobs,
457 catch=tuple(catch) if isinstance(catch, Iterable) else (catch,),
458 callbacks=callbacks,
459 gc_after_trial=gc_after_trial,
460 show_progress_bar=show_progress_bar,
461 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:66, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
64 try:
65 if n_jobs == 1:
---> 66 _optimize_sequential(
67 study,
68 func,
69 n_trials,
70 timeout,
71 catch,
72 callbacks,
73 gc_after_trial,
74 reseed_sampler_rng=False,
75 time_start=None,
76 progress_bar=progress_bar,
77 )
78 else:
79 if n_jobs == -1:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:163, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
160 break
162 try:
--> 163 frozen_trial = _run_trial(study, func, catch)
164 finally:
165 # The following line mitigates memory problems that can be occurred in some
166 # environments (e.g., services that use computing containers such as GitHub Actions).
167 # Please refer to the following PR for further details:
168 # https://github.com/optuna/optuna/pull/325.
169 if gc_after_trial:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:251, in _run_trial(study, func, catch)
244 assert False, "Should not reach."
246 if (
247 frozen_trial.state == TrialState.FAIL
248 and func_err is not None
249 and not isinstance(func_err, catch)
250 ):
--> 251 raise func_err
252 return frozen_trial
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:200, in _run_trial(study, func, catch)
198 with get_heartbeat_thread(trial._trial_id, study._storage):
199 try:
--> 200 value_or_values = func(trial)
201 except exceptions.TrialPruned as e:
202 # TODO(mamu): Handle multi-objective cases.
203 state = TrialState.PRUNED
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:319, in BaseAuto._optuna_tune_model.<locals>.objective(trial)
311 fitted_model = self._fit_model(
312 cls_model=cls_model,
313 config=cfg,
(...)
316 test_size=test_size,
317 )
318 trial.set_user_attr("ALL_PARAMS", cfg)
--> 319 return fitted_model.trainer.callback_metrics["valid_loss"].item()
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/pytorch_lightning/core/module.py:209, in LightningModule.trainer(self)
207 return _TrainerFabricShim(fabric=self._fabric) # type: ignore[return-value]
208 if not self._jit_is_scripting and self._trainer is None:
--> 209 raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
210 return self._trainer
RuntimeError: NHITS is not attached to a `Trainer`.
Among the models I have tested, only TimesNet is working so far.
# from ray import tune
from neuralforecast.losses.pytorch import RMSE
from neuralforecast import NeuralForecast
from neuralforecast.auto import TimesNet, AutoNHITS, AutoLSTM, AutoRNN
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST
horizon = 52
nf = NeuralForecast(
models= [
#AutoNHITS(h= horizon, config = None, loss=RMSE(), num_samples=5, backend='optuna'),
TimesNet (input_size = 2*horizon, # 4*time_len,
top_k = 3, # Number of periods (for FFT).
num_kernels = 3, # Number of kernels for Inception module
batch_size = 2, # Number of time series per batch
windows_batch_size = 32, # Number of windows per batch
learning_rate = 0.001,
h = horizon,
loss=RMSE(),
start_padding_enabled = True,
scaler_type = 'robust'
) # Horizon
)
],
freq='W-SAT'
)
nf.fit(train_df)
This works perfect!
For Auto models it doesn't give any error message, but is stuck.
Informer, PatchTST, Autoformer models give error (process 0 terminated with signal SIGSEGV)
Hello @cchallu
Was wondering if there is any news on this topic.
Also, Is there a detailed documentation on how I should use 'optuna' , or just changing backend = 'optuna' is sufficient. Just was wondering if I did it correctly..
Hi @cchallu
I have found a working example for databricks + pytorch lightning + ray combination. This code runs fine without getting stuck when doing ray tuner.fit()
.
I guess it is more 'cluster' than the 'databricks' that is the important here. I have found that one has to specify 'scaling_configs' dictionary for example that wasn't present at the _base_auto.py of the neuralforecast.
Do you think there is any way I can pass in such configs to the neuralforecast library?