juice icon indicating copy to clipboard operation
juice copied to clipboard

Slow Convergence for CUDNN RNN

Open lissahyacinth opened this issue 4 years ago • 1 comments

Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.

    net_cfg.add_layer(LayerConfig::new(
        // Layer name is only used internally - can be changed to anything
        "LSTMInitial",
        RnnConfig {
            hidden_size: 5,
            num_layers: 1,
            dropout_seed: 123,
            dropout_probability: 0.5,
            rnn_type: RnnNetworkMode::LSTM,
            input_mode: RnnInputMode::LinearInput,
            direction_mode: DirectionMode::UniDirectional,
        },
    ));
    net_cfg.add_layer(LayerConfig::new("linear1", LinearConfig { output_size: 1 }));
    net_cfg.add_layer(LayerConfig::new("sigmoid", LayerType::Sigmoid));

Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.

Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.

Current theories

  • SGD is not suitable to this problem - RMSProp may be
  • Weight initialisation is done incorrectly somewhere, or Glorot is unsuitable to the LSTM we're using
  • LSTM is improperly setup, and is causing an issue with performance.

lissahyacinth avatar Jun 24 '20 12:06 lissahyacinth

@lissahyacinth coud your attach the gist with the data you used? The links posted in chat unfortunately already expired.

drahnr avatar Oct 21 '20 09:10 drahnr