neuralforecast [Model] Auto versions of the models are not working in databricks

What happened + What you expected to happen

In databricks Auto versions of the model seem not to be working. Might be related with this issue: ray tune not working inside databricks
Some models were working okay (TimesNet), however Auto models were getting stuck.

Versions / Dependencies

1.6.4

Reproduction script

From getting started example

Issue Severity

None

Jan 09 '24 22:01 iamyihwa

Hi @iamyihwa! Have you tried using the optuna backend for Auto models?

Jan 10 '24 16:01 cchallu

Hi @cchallu Thanks for the idea! Just tried it, however fails with an error.

NHITS is not attached to a Trainer.

from neuralforecast.losses.pytorch import RMSE

from neuralforecast import NeuralForecast
from neuralforecast.auto import TimesNet, AutoNHITS, AutoLSTM, AutoRNN
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST


horizon = 52
nf = NeuralForecast(
    models= [
        AutoNHITS( h= horizon, config = None, loss=RMSE(), backend='optuna'),
        TimesNet (input_size = 2*horizon, # 4*time_len,
                  top_k = 3,                                                   # Number of periods (for FFT).
                  num_kernels = 3,                                             # Number of kernels for Inception module
                  batch_size = 2,                                              # Number of time series per batch
                  windows_batch_size = 32,  # Number of windows per batch
                  learning_rate = 0.001,
                  h = horizon,
                  loss=RMSE(),
                  start_padding_enabled = True, 
                  scaler_type = 'robust'
                #  futr_exog_list = ['Holiday_Flag', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment'], # Future exogenous variables                                       )                                                # Horizon
                )
    ],
    freq='W-SAT'
    
)
nf.fit(train_df)

Error message:

W 2024-01-10 17:05:45,289] Trial 0 failed with parameters: {'n_pool_kernel_size': [8, 4, 1], 'n_freq_downsample': [1, 1, 1], 'learning_rate': 0.0017397234192811407, 'scaler_type': 'robust', 'max_steps': 700.0, 'batch_size': 128, 'windows_batch_size': 256, 'random_seed': 17, 'input_size': 104, 'step_size': 52} because of the following error: RuntimeError('NHITS is not attached to a `Trainer`.').
Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
    value_or_values = func(trial)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py", line 319, in objective
    return fitted_model.trainer.callback_metrics["valid_loss"].item()
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 209, in trainer
    raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
RuntimeError: NHITS is not attached to a `Trainer`.
[W 2024-01-10 17:05:45,290] Trial 0 failed with value None.
RuntimeError: NHITS is not attached to a `Trainer`.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File <command-2274526657341609>, line 1
----> 1 nf.fit(train_df) 

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/core.py:274, in NeuralForecast.fit(self, df, static_df, val_size, sort_df, use_init_models, verbose)
    271         print("WARNING: Deleting previously fitted models.")
    273 for model in self.models:
--> 274     model.fit(self.dataset, val_size=val_size)
    276 self._fitted = True

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:375, in BaseAuto.fit(self, dataset, val_size, test_size, random_seed)
    373     best_config = results.get_best_result().config
    374 else:
--> 375     results = self._optuna_tune_model(
    376         cls_model=self.cls_model,
    377         dataset=dataset,
    378         val_size=val_size,
    379         test_size=test_size,
    380         verbose=self.verbose,
    381         num_samples=self.num_samples,
    382         search_alg=search_alg,
    383         config=self.config,
    384     )
    385     best_config = results.best_trial.user_attrs["ALL_PARAMS"]
    386 self.model = self._fit_model(
    387     cls_model=self.cls_model,
    388     config=best_config,
   (...)
    391     test_size=test_size,
    392 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:327, in BaseAuto._optuna_tune_model(self, cls_model, dataset, val_size, test_size, verbose, num_samples, search_alg, config)
    324     sampler = None
    326 study = optuna.create_study(sampler=sampler, direction="minimize")
--> 327 study.optimize(
    328     objective,
    329     n_trials=num_samples,
    330     show_progress_bar=verbose,
    331 )
    332 return study

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/study.py:451, in Study.optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
    348 def optimize(
    349     self,
    350     func: ObjectiveFuncType,
   (...)
    357     show_progress_bar: bool = False,
    358 ) -> None:
    359     """Optimize an objective function.
    360 
    361     Optimization is done by choosing a suitable set of hyperparameter values from a given
   (...)
    449             If nested invocation of this method occurs.
    450     """
--> 451     _optimize(
    452         study=self,
    453         func=func,
    454         n_trials=n_trials,
    455         timeout=timeout,
    456         n_jobs=n_jobs,
    457         catch=tuple(catch) if isinstance(catch, Iterable) else (catch,),
    458         callbacks=callbacks,
    459         gc_after_trial=gc_after_trial,
    460         show_progress_bar=show_progress_bar,
    461     )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:66, in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
     64 try:
     65     if n_jobs == 1:
---> 66         _optimize_sequential(
     67             study,
     68             func,
     69             n_trials,
     70             timeout,
     71             catch,
     72             callbacks,
     73             gc_after_trial,
     74             reseed_sampler_rng=False,
     75             time_start=None,
     76             progress_bar=progress_bar,
     77         )
     78     else:
     79         if n_jobs == -1:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:163, in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
    160         break
    162 try:
--> 163     frozen_trial = _run_trial(study, func, catch)
    164 finally:
    165     # The following line mitigates memory problems that can be occurred in some
    166     # environments (e.g., services that use computing containers such as GitHub Actions).
    167     # Please refer to the following PR for further details:
    168     # https://github.com/optuna/optuna/pull/325.
    169     if gc_after_trial:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:251, in _run_trial(study, func, catch)
    244         assert False, "Should not reach."
    246 if (
    247     frozen_trial.state == TrialState.FAIL
    248     and func_err is not None
    249     and not isinstance(func_err, catch)
    250 ):
--> 251     raise func_err
    252 return frozen_trial

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/optuna/study/_optimize.py:200, in _run_trial(study, func, catch)
    198 with get_heartbeat_thread(trial._trial_id, study._storage):
    199     try:
--> 200         value_or_values = func(trial)
    201     except exceptions.TrialPruned as e:
    202         # TODO(mamu): Handle multi-objective cases.
    203         state = TrialState.PRUNED

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py:319, in BaseAuto._optuna_tune_model.<locals>.objective(trial)
    311 fitted_model = self._fit_model(
    312     cls_model=cls_model,
    313     config=cfg,
   (...)
    316     test_size=test_size,
    317 )
    318 trial.set_user_attr("ALL_PARAMS", cfg)
--> 319 return fitted_model.trainer.callback_metrics["valid_loss"].item()

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-1d6e89c7-64a2-4eee-b2c8-ebdc4de69a9e/lib/python3.10/site-packages/pytorch_lightning/core/module.py:209, in LightningModule.trainer(self)
    207     return _TrainerFabricShim(fabric=self._fabric)  # type: ignore[return-value]
    208 if not self._jit_is_scripting and self._trainer is None:
--> 209     raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")
    210 return self._trainer

RuntimeError: NHITS is not attached to a `Trainer`.

Among the models I have tested, only TimesNet is working so far.

# from ray import tune
from neuralforecast.losses.pytorch import RMSE

from neuralforecast import NeuralForecast
from neuralforecast.auto import TimesNet, AutoNHITS, AutoLSTM, AutoRNN
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST


horizon = 52
nf = NeuralForecast(
    models= [
       #AutoNHITS(h= horizon, config = None, loss=RMSE(), num_samples=5, backend='optuna'),
        TimesNet (input_size = 2*horizon, # 4*time_len,
                  top_k = 3,                                                   # Number of periods (for FFT).
                  num_kernels = 3,                                             # Number of kernels for Inception module
                  batch_size = 2,                                              # Number of time series per batch
                  windows_batch_size = 32,  # Number of windows per batch
                  learning_rate = 0.001,
                  h = horizon,
                  loss=RMSE(),
                  start_padding_enabled = True, 
                  scaler_type = 'robust'
                                    )                                                # Horizon
                )
    ],
    freq='W-SAT'
    
)

nf.fit(train_df)

This works perfect!

For Auto models it doesn't give any error message, but is stuck.

Informer, PatchTST, Autoformer models give error (process 0 terminated with signal SIGSEGV)

Jan 10 '24 17:01 iamyihwa

Hello @cchallu
Was wondering if there is any news on this topic. Also, Is there a detailed documentation on how I should use 'optuna' , or just changing backend = 'optuna' is sufficient. Just was wondering if I did it correctly..

Jan 18 '24 15:01 iamyihwa

Hi @cchallu I have found a working example for databricks + pytorch lightning + ray combination. This code runs fine without getting stuck when doing ray tuner.fit().

I guess it is more 'cluster' than the 'databricks' that is the important here. I have found that one has to specify 'scaling_configs' dictionary for example that wasn't present at the _base_auto.py of the neuralforecast.

Do you think there is any way I can pass in such configs to the neuralforecast library?

Jan 22 '24 11:01 iamyihwa

neuralforecast neuralforecast copied to clipboard

[Model] Auto versions of the models are not working in databricks

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

neuralforecast
neuralforecast copied to clipboard