pytorch-forecasting icon indicating copy to clipboard operation
pytorch-forecasting copied to clipboard

Max gradient norm in TFT

Open msachin93 opened this issue 3 years ago • 1 comments

The paper on temporal fusion transformer has a max gradient norm range from 0.01 to 100. How to decide the optimal value for training a model for a specific dataset.

msachin93 avatar Apr 06 '22 06:04 msachin93

In TFT original implementation, Hyperparameters mentioned for different datasets: https://github.com/google-research/google-research/blob/master/tft/data_formatters/favorita.py, max_gradient_norm is mentioned as 100 for retail. When we use pytorch forecasting - TFT model, is there a way i can pass this information in trainer/encoder model_params = { 'dropout_rate': 0.1, 'hidden_layer_size': 240, 'learning_rate': 0.001, 'minibatch_size': 128, 'max_gradient_norm': 100., 'num_heads': 4, 'stack_size': 1 }

gkdivya avatar Apr 13 '22 11:04 gkdivya