pytorch-forecasting
pytorch-forecasting copied to clipboard
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn. When refitting with updated data.
- PyTorch-Forecasting version: 1.0.0
- PyTorch version: 2.1.0.dev20230703
- Python version: 3.10.11
- Operating System: macOS Venturea Version 13.1
My goal is to create a 14 day forecast on demand using my own data. The demand follows a grouping of Machine and Dish. I followed the tutorial and created a TimeSeriesDataSet with my own data. I've successfully fit and predicted with my own data and got good results.
Expected behavior
Adding newer data and refit the model will lead to a better tuned model, the pipeline would stay the same and could handle additional demand observations.
Actual behavior
When I added new data the first time I got the error in the title. I have no idea why. The only thing I new is that the problem was in the new data. I managed to find the Machine Dish combination that was giving me the problem so I removed it and the problem was fixed.
The second time I added data I had no problems.
The third time I added data the problem returned and I don't understand the mechanism behind why its here or how to automatically identify the problematic Machine Dish combinations to remove them from the data.
Code to reproduce the problem
I can't share the data since its private, but I can sare the code I used. Important to note that the prediction length is 14.
max_encoder_croston_simple = (croston_simple_covariates['Date'].max() - croston_simple_covariates['Date'].min()).days
croston_simple_training = TimeSeriesDataSet(
croston_simple_training_df,
time_idx="time_idx",
target="Demand",
group_ids=["Machine", "Dish"],
min_encoder_length=1, # to allow for cold starts
max_encoder_length=max_encoder_croston_simple,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
min_prediction_idx=1,
static_categoricals= static_categoricals,
static_reals=static_reals,
time_varying_known_categoricals=time_varying_known_categoricals,
variable_groups=variable_groups, # group of categorical variables can be treated as one variable
time_varying_known_reals=time_varying_known_reals,
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=time_varying_unknown_reals,
target_normalizer=GroupNormalizer(
groups=["Machine", "Dish"], transformation="softplus"
), # use softplus and normalize by group
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
allow_missing_timesteps=True,
categorical_encoders = {"Dish":NaNLabelEncoder(add_nan=True),
"Machine": NaNLabelEncoder(add_nan=True),
"holidays":NaNLabelEncoder(add_nan=True),
"Month":NaNLabelEncoder(add_nan=True),
"Category": NaNLabelEncoder(add_nan=True),
"dish_temperature": NaNLabelEncoder(add_nan=True),
"DayOfWeek": NaNLabelEncoder(add_nan=True),
"machineActivity": NaNLabelEncoder(add_nan=True)}
)
batch_size = 42 # set this between 32 to 128
train_dataloader_croston_simple = croston_simple_training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader_croston_simple = croston_simple_validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=5, verbose=True, mode="min")
lr_logger = LearningRateMonitor() # log the learning rate
croston_simple_logger = TensorBoardLogger(save_dir="lightning_logs", name = "croston_exploration")
croston_simple_trainer = pl.Trainer(
max_epochs= 100,
accelerator="cpu",
enable_model_summary=True,
gradient_clip_val=croston_simple_study.best_trial.params["gradient_clip_val"],
limit_train_batches=50,
#fast_dev_run=True,
callbacks=[lr_logger, early_stop_callback],
logger=croston_simple_logger,
)
croston_simple_tft = TemporalFusionTransformer.from_dataset(
croston_simple_training,
learning_rate=croston_simple_study.best_trial.params['learning_rate'],
hidden_size=croston_simple_study.best_trial.params['hidden_size'],
attention_head_size=croston_simple_study.best_trial.params['attention_head_size'],
dropout=croston_simple_study.best_trial.params['dropout'],
hidden_continuous_size=croston_simple_study.best_trial.params['hidden_continuous_size'],
loss=QuantileLoss(),
log_interval=5, # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
optimizer="Ranger",
reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {croston_simple_tft.size()/1e3:.1f}k")
# fit network
croston_simple_trainer.fit(
croston_simple_tft,
train_dataloaders=train_dataloader_croston_simple,
val_dataloaders=val_dataloader_croston_simple,
)
RuntimeError Traceback (most recent call last)
Cell In[15], line 2
1 # fit network
----> 2 croston_simple_trainer.fit(
3 croston_simple_tft,
4 train_dataloaders=train_dataloader_croston_simple,
5 val_dataloaders=val_dataloader_croston_simple,
6 )
File /opt/homebrew/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:531, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
529 model = _maybe_unwrap_optimized(model)
530 self.strategy._lightning_module = model
--> 531 call._call_and_handle_interrupt(
532 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
533 )
File /opt/homebrew/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:42, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
40 if trainer.strategy.launcher is not None:
41 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 42 return trainer_fn(*args, **kwargs)
44 except _TunerExitException:
45 _call_teardown_hook(trainer)
File /opt/homebrew/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:570, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
...
--> 204 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
205 tensors, grad_tensors_, retain_graph, create_graph, inputs,
206 allow_unreachable=True, accumulate_grad=True)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hey, try changing the gradient clipping parameter as well as play with learning rate.
I've tried this for a couple of hours but it doesnt seem to get me anywhere. I didn't have this problem for two weeks straight but then it showed up again.
Hey, try changing the gradient clipping parameter as well as play with learning rate.
Im trying to use the optimize_hyperparameters function to find the best instead of guessing. But it turns out that this problem persists within the optuna optimization. Could this be a version problem? I'm stumped.
Any updates on the error ? I have the same one, on a similar task.
Unfortunately I have stopped using this package too many issues going on at the same time, shame, it was working really well for a while.