pytorch-forecasting
pytorch-forecasting copied to clipboard
Max gradient norm in TFT
The paper on temporal fusion transformer has a max gradient norm range from 0.01 to 100. How to decide the optimal value for training a model for a specific dataset.
In TFT original implementation, Hyperparameters mentioned for different datasets: https://github.com/google-research/google-research/blob/master/tft/data_formatters/favorita.py, max_gradient_norm is mentioned as 100 for retail. When we use pytorch forecasting - TFT model, is there a way i can pass this information in trainer/encoder model_params = { 'dropout_rate': 0.1, 'hidden_layer_size': 240, 'learning_rate': 0.001, 'minibatch_size': 128, 'max_gradient_norm': 100., 'num_heads': 4, 'stack_size': 1 }