juice
juice copied to clipboard
Slow Convergence for CUDNN RNN
Network being used is LSTM -> Linear -> Sigmoid network using Mackey-Glass.
net_cfg.add_layer(LayerConfig::new(
// Layer name is only used internally - can be changed to anything
"LSTMInitial",
RnnConfig {
hidden_size: 5,
num_layers: 1,
dropout_seed: 123,
dropout_probability: 0.5,
rnn_type: RnnNetworkMode::LSTM,
input_mode: RnnInputMode::LinearInput,
direction_mode: DirectionMode::UniDirectional,
},
));
net_cfg.add_layer(LayerConfig::new("linear1", LinearConfig { output_size: 1 }));
net_cfg.add_layer(LayerConfig::new("sigmoid", LayerType::Sigmoid));
Using Batch Size 12 and LR 0.1, the network begins the converge successfully. During training it was observed that the output for the RNN -> Linear stage was similar in output predictions despite a difference in RNN outputs, which was believed to be an issue with the weight initialisation.
Ideally this network can be trained at a higher batch size across a single epoch, with a MSE of 0.05, as the function being approximated is fairly simple.
Current theories
- SGD is not suitable to this problem - RMSProp may be
- Weight initialisation is done incorrectly somewhere, or Glorot is unsuitable to the LSTM we're using
- LSTM is improperly setup, and is causing an issue with performance.
@lissahyacinth coud your attach the gist with the data you used? The links posted in chat unfortunately already expired.