amazon-dsstne
amazon-dsstne copied to clipboard
Training error with deep auto encoder
Hi, I am running a deep auto encoder using DSSTNE but having issues with the training. The first epoch returns a training error of > 0, and the second epoch results in 0.
NNNetwork::Train: Epoch 1, average error 42600148992.000000, average training error 16.304724, average regularization error 42600148992.000000, elapsed time 461.341889s NNNetwork::Train: Epoch 2, average error 42742050816.000000, average training error 0.000000, average regularization error 42742050816.000000, elapsed time 462.598641s
Any clues as to why this might be happening? One of the configs I tried is:
{ "Version" : 0.8, "Name" : "AE", "Kind" : "FeedForward",
"SparsenessPenalty" : {
"p" : 0.5,
"beta" : 2.0
},
"ShuffleIndices" : true,
"Denoising" : {
"p" : 0.2
},
"ScaledMarginalCrossEntropy" : {
"oneTarget" : 1.0,
"zeroTarget" : 0.0,
"oneScale" : 1.0,
"zeroScale" : 1.0
},
"Layers" : [
{ "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "gl_input", "Sparse" : true },
{ "Name" : "Hidden1", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false},
{ "Name" : "Hidden2", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
{ "Name" : "Hidden3", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :1024 , "Activation" : "ScaledExponentialLinear", "Sparse" : false,"pDropout" :0.8 },
{ "Name" : "Hidden4", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
{ "Name" : "Hidden5", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
{ "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "gl_output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true }
],
"ErrorFunction" : "ScaledMarginalCrossEntropy"
}
Actually, I have seen something like this recently while adding Batch Norm to DSSTNE. I had the learning rate sign backwards, a positive instead of a negative when using the results from cuDNN. Flipping the sign on the learning then caused it to converge. Until I did that, I had the results just like yours.
So, I'm guessing that it is similar - that it grows without bounds, caps out, then can not change values any more, so average training error drops to zero (ie because they can not change). How are you invoking DSSTNE? Do you have your own main.cpp, or are you using encoder built from main.cpp in utils?
I would turn on verbose mode and watch per-minibatch training error to see when it blows up.
Hello @scottlegrand,
I would like to know if I can use other Error Functions, such as RMSE in the config.json
file? I couldn't find anything here about the Error function.
Also would you please tell me where I can add the verbose mode?