juice Slow Convergence for CUDNN RNN

Slow Convergence for CUDNN RNN

Open lissahyacinth opened this issue 4 years ago • 1 comments

Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.

    net_cfg.add_layer(LayerConfig::new(
        // Layer name is only used internally - can be changed to anything
        "LSTMInitial",
        RnnConfig {
            hidden_size: 5,
            num_layers: 1,
            dropout_seed: 123,
            dropout_probability: 0.5,
            rnn_type: RnnNetworkMode::LSTM,
            input_mode: RnnInputMode::LinearInput,
            direction_mode: DirectionMode::UniDirectional,
        },
    ));
    net_cfg.add_layer(LayerConfig::new("linear1", LinearConfig { output_size: 1 }));
    net_cfg.add_layer(LayerConfig::new("sigmoid", LayerType::Sigmoid));

Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.

Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.

Current theories

SGD is not suitable to this problem - RMSProp may be
Weight initialisation is done incorrectly somewhere, or Glorot is unsuitable to the LSTM we're using
LSTM is improperly setup, and is causing an issue with performance.

Jun 24 '20 12:06 lissahyacinth

@lissahyacinth coud your attach the gist with the data you used? The links posted in chat unfortunately already expired.

Oct 21 '20 09:10 drahnr

juice juice copied to clipboard

Slow Convergence for CUDNN RNN

juice
juice copied to clipboard