amazon-dsstne icon indicating copy to clipboard operation
amazon-dsstne copied to clipboard

Training error with deep auto encoder

Open sinaakhtar opened this issue 7 years ago • 3 comments

Hi, I am running a deep auto encoder using DSSTNE but having issues with the training. The first epoch returns a training error of > 0, and the second epoch results in 0.

NNNetwork::Train: Epoch 1, average error 42600148992.000000, average training error 16.304724, average regularization error 42600148992.000000, elapsed time 461.341889s NNNetwork::Train: Epoch 2, average error 42742050816.000000, average training error 0.000000, average regularization error 42742050816.000000, elapsed time 462.598641s

Any clues as to why this might be happening? One of the configs I tried is:

{ "Version" : 0.8, "Name" : "AE", "Kind" : "FeedForward",

"SparsenessPenalty" : {
    "p" : 0.5,
    "beta" : 2.0
},

"ShuffleIndices" : true,

"Denoising" : {
    "p" : 0.2
},

"ScaledMarginalCrossEntropy" : {
    "oneTarget" : 1.0,
    "zeroTarget" : 0.0,
    "oneScale" : 1.0,
    "zeroScale" : 1.0
},
"Layers" : [
    { "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "gl_input", "Sparse" : true },
    { "Name" : "Hidden1", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false},
	{ "Name" : "Hidden2", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
	{ "Name" : "Hidden3", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :1024 , "Activation" : "ScaledExponentialLinear", "Sparse" : false,"pDropout" :0.8 },
	{ "Name" : "Hidden4", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
	{ "Name" : "Hidden5", "Kind" : "Hidden", "Type" : "FullyConnected", "N" :512 , "Activation" : "ScaledExponentialLinear", "Sparse" : false },
    { "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "gl_output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true }
],

"ErrorFunction" : "ScaledMarginalCrossEntropy"

}

sinaakhtar avatar Feb 21 '18 08:02 sinaakhtar

Actually, I have seen something like this recently while adding Batch Norm to DSSTNE. I had the learning rate sign backwards, a positive instead of a negative when using the results from cuDNN. Flipping the sign on the learning then caused it to converge. Until I did that, I had the results just like yours.

So, I'm guessing that it is similar - that it grows without bounds, caps out, then can not change values any more, so average training error drops to zero (ie because they can not change). How are you invoking DSSTNE? Do you have your own main.cpp, or are you using encoder built from main.cpp in utils?

ekandrotA9 avatar May 11 '18 22:05 ekandrotA9

I would turn on verbose mode and watch per-minibatch training error to see when it blows up.

scottlegrand avatar Jun 12 '18 18:06 scottlegrand

Hello @scottlegrand,

I would like to know if I can use other Error Functions, such as RMSE in the config.json file? I couldn't find anything here about the Error function. Also would you please tell me where I can add the verbose mode?

spacelover1 avatar Mar 24 '20 09:03 spacelover1