pytorch-forecasting Max gradient norm in TFT

Max gradient norm in TFT

Open msachin93 opened this issue 3 years ago • 1 comments

The paper on temporal fusion transformer has a max gradient norm range from 0.01 to 100. How to decide the optimal value for training a model for a specific dataset.

Apr 06 '22 06:04 msachin93

In TFT original implementation, Hyperparameters mentioned for different datasets: https://github.com/google-research/google-research/blob/master/tft/data_formatters/favorita.py, max_gradient_norm is mentioned as 100 for retail. When we use pytorch forecasting - TFT model, is there a way i can pass this information in trainer/encoder model_params = { 'dropout_rate': 0.1, 'hidden_layer_size': 240, 'learning_rate': 0.001, 'minibatch_size': 128, 'max_gradient_norm': 100., 'num_heads': 4, 'stack_size': 1 }

Apr 13 '22 11:04 gkdivya

pytorch-forecasting pytorch-forecasting copied to clipboard

Max gradient norm in TFT

pytorch-forecasting
pytorch-forecasting copied to clipboard