Deep-Learning-TensorFlow copied to clipboard
DBN finetuning issue
Hi, I started exploring the package yesterday. Started with the command line DBN example (same network and hyperparameters) as given in the documentation.
python command_line/ --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func relu
Pretraining worked well as in the reconstruction error reduced over the epochs.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Creating stored_models/dbn-models directory to save/restore models
Creating data/dbn-models directory to save model generated data
Creating logs/dbn-models directory to save tensorboard logs
Creating stored_models/dbn-models/rbm-1 directory to save/restore models
Creating data/dbn-models/rbm-1 directory to save model generated data
Creating logs/dbn-models/rbm-1 directory to save tensorboard logs
Creating stored_models/dbn-models/rbm-2 directory to save/restore models
Creating data/dbn-models/rbm-2 directory to save model generated data
Creating logs/dbn-models/rbm-2 directory to save tensorboard logs
Training layer 1...
Tensorboard logs dir for this run is logs/dbn-models/rbm-1/run6
Reconstruction loss at step 0: 0.121678
Reconstruction loss at step 1: 0.103993
Reconstruction loss at step 2: 0.094516
Reconstruction loss at step 3: 0.0880938
Reconstruction loss at step 4: 0.0832195
Reconstruction loss at step 5: 0.0794863
Reconstruction loss at step 6: 0.0764901
Reconstruction loss at step 7: 0.0738373
Reconstruction loss at step 8: 0.0716205
Reconstruction loss at step 9: 0.0696231
Reconstruction loss at step 10: 0.0680134
Reconstruction loss at step 11: 0.0665949
Reconstruction loss at step 12: 0.0651861
Reconstruction loss at step 13: 0.0640741
Reconstruction loss at step 14: 0.0630116
Training layer 2...
Tensorboard logs dir for this run is logs/dbn-models/rbm-2/run5
Reconstruction loss at step 0: 0.172686
Reconstruction loss at step 1: 0.144182
Reconstruction loss at step 2: 0.129113
Reconstruction loss at step 3: 0.119446
Reconstruction loss at step 4: 0.112444
Reconstruction loss at step 5: 0.107379
Reconstruction loss at step 6: 0.103563
Reconstruction loss at step 7: 0.100636
Reconstruction loss at step 8: 0.0982092
Reconstruction loss at step 9: 0.0960361
Reconstruction loss at step 10: 0.0942256
Reconstruction loss at step 11: 0.0926428
Reconstruction loss at step 12: 0.0913576
Reconstruction loss at step 13: 0.0902729
Reconstruction loss at step 14: 0.0891995
But when it came to fine-tuning, the accuracy remained the same across epochs (i.e. network was not trained).
Start deep belief net finetuning...
Tensorboard logs dir for this run is logs/dbn-models/run5
Accuracy at step 0: 0.0958
Accuracy at step 1: 0.0958
Accuracy at step 2: 0.0958
Accuracy at step 3: 0.0958
Accuracy at step 4: 0.0958
Accuracy at step 5: 0.0958
Accuracy at step 6: 0.0958
Accuracy at step 7: 0.0958
I also tried with other parameter configurations and datasets, but had the same problem. Then I tried without pretraining and this time the test accuracy got better as expected.
python command_line/ --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func relu --do_pretrain False
Accuracy at step 0: 0.5808
Accuracy at step 1: 0.7334
Accuracy at step 2: 0.7884
Accuracy at step 3: 0.8222
Accuracy at step 4: 0.8392
Accuracy at step 5: 0.8544
Accuracy at step 6: 0.8642
Accuracy at step 7: 0.8722
Accuracy at step 8: 0.8782
Did anybody else get the DBN with pretraining to work from command line? I know the parameters in the documentation are not the best ones, but I still expect the network to train. I will now look into the code to find the issue, but any pointers would be appreciated. thanks!
Hi @PrefMiner, I also found the same issue. With pretraining: the RBMs training is fine, but finetuning doesn't work. Without pretraining: works well. I don't know yet what is the reason for this issue, any help would be highly appreciated. Thanks!
Hi,I think the argument '--finetune_act_func' is 'sigmoid' not 'relu', because nodes of hidden layer define conditional probabilities. The accuracy can grow with this argument. But accuracy growing is very slow and I don't know how to speed up this.
@twovillage I guess you are right this sigmoid
function should be in the last layer.... and explain why the accuracy was not growing...
Just changed the default activation function to sigmoid. The accuracy is growing as expected :+1:
Hi @blackecho @twovillage
I replace 'relu'
with 'sigmoid'
as your suggestion, but the accuracy is still bad.
Can you tell me which arguments are incorrect? thanks
python --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func sigmoid
Start deep belief net finetuning...
Tensorboard logs dir for this run is logs/dbn-models/run19
Accuracy at step 0: 0.071
Accuracy at step 1: 0.0662
Accuracy at step 2: 0.0682
Accuracy at step 3: 0.0706
Accuracy at step 4: 0.0748
Accuracy at step 5: 0.077
Accuracy at step 6: 0.0792
Accuracy at step 7: 0.0838
Accuracy at step 8: 0.084
Accuracy at step 9: 0.0872
Test set accuracy: 0.0808999985456
Even increase --finetune_num_epochs to 1000, the accuracy is still low.
Accuracy at step 990: 0.1132
Accuracy at step 991: 0.116
Accuracy at step 992: 0.1094
Accuracy at step 993: 0.1092
Accuracy at step 994: 0.1014
Accuracy at step 995: 0.1098
Accuracy at step 996: 0.1126
Accuracy at step 997: 0.1144
Accuracy at step 998: 0.112
Accuracy at step 999: 0.1132
Test set accuracy: 0.109600000083
@blackecho any idea why the accuracy is not good??? I have used the fine_tune_activation function as "sigmoid"...but still the test acccuracy is 0.106899999082.
Hi!I didn't use the model in comand line, I import it. When I use dbn with pretraining ,should I call the method 'pretrain' before 'fit'? here is my code
import tensorflow as tf
from yadlt.models.boltzmann import dbn
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=True)
trX, trY, teX, teY = mnist.train.images, mnist.train.labels,
mnist.test.images, mnist.test.labels
with tf.Session() as sess: dbn = dbn.DeepBeliefNetwork(rbm_layers=[512,256], name='dbn', do_pretrain=True, rbm_num_epochs=[10], rbm_gibbs_k=[10], rbm_gauss_visible=False, rbm_stddev=0.1, rbm_batch_size=[10], rbm_learning_rate=[0.01], finetune_dropout=1, finetune_loss_func='softmax_cross_entropy', finetune_act_func=tf.nn.relu, finetune_opt='sgd', finetune_learning_rate=0.001, finetune_num_epochs=10, finetune_batch_size=20, momentum=0.5) #print(data_train) dbn.pretrain(trX),trY,teX,teY)
If you set do_pretrain=False
and you don't call pretrain()
, then what you'll have is a standard MLP. With do_pretrain=True
and a call to pretrain()
you'll perform greedy unsupervised learning of a stack of RBMs, before unrolling them into a MLP.
@blackecho Many thanks! Excellent work.
@blackecho Another question: When I use RBM or DBN, should I normalize my input data to zero mean unit variance ? It really confuses me.
Yes, you can both simply normalize in [0, 1] or normalize to zero mean and unit variance. I would go with the latter.
@blackecho Thank you!
@blackecho Without using pretraining, the accuracy on MNIST is 0.92, however, when using pretraining, the accuracy is 0.087, any idea? Thanks
Nope, unfortunately I didn't found the cause of this issue yet.
@blackecho Here I found another code, and it works well. However, I cannot found the difference.
@blackecho I guess the problem is the apperance of 'nan'.Here is a print when I using another code, it has the same problem.I think maybe it can give some clue: Starting pretraining...
Pretraing layer 0 Epoch 0 cost: 155.12645916331894
Pretraing layer 0 Epoch 1 cost: 143.80763538707393
Pretraing layer 0 Epoch 2 cost: 141.32982020984997
Pretraing layer 0 Epoch 3 cost: 51.984577359286234
Pretraing layer 0 Epoch 4 cost: 0.0
Pretraing layer 0 Epoch 5 cost: 0.0
Pretraing layer 0 Epoch 6 cost: 0.0
Pretraing layer 0 Epoch 7 cost: 0.0
Pretraing layer 0 Epoch 8 cost: 0.0
Pretraing layer 0 Epoch 9 cost: 0.0
Pretraing layer 1 Epoch 0 cost: nan
Pretraing layer 1 Epoch 1 cost: nan
Pretraing layer 1 Epoch 2 cost: nan
Pretraing layer 1 Epoch 3 cost: nan
Pretraing layer 1 Epoch 4 cost: nan
Pretraing layer 1 Epoch 5 cost: nan
Pretraing layer 1 Epoch 6 cost: nan
Pretraing layer 1 Epoch 7 cost: nan
Pretraing layer 1 Epoch 8 cost: nan
Pretraing layer 1 Epoch 9 cost: nan
Pretraing layer 2 Epoch 0 cost: nan
Pretraing layer 2 Epoch 1 cost: nan
Pretraing layer 2 Epoch 2 cost: nan
Pretraing layer 2 Epoch 3 cost: nan
Pretraing layer 2 Epoch 4 cost: nan
Pretraing layer 2 Epoch 5 cost: nan
Pretraing layer 2 Epoch 6 cost: nan
Pretraing layer 2 Epoch 7 cost: nan
Pretraing layer 2 Epoch 8 cost: nan
Pretraing layer 2 Epoch 9 cost: nan
The pretraining process ran for 3.4709213713039855 minutes
Start finetuning...
Epoch 0 cost: nan, validation accuacy: 0.0957999974489212
Epoch 1 cost: nan, validation accuacy: 0.0957999974489212
Epoch 2 cost: nan, validation accuacy: 0.0957999974489212
Epoch 3 cost: nan, validation accuacy: 0.0957999974489212
Epoch 4 cost: nan, validation accuacy: 0.0957999974489212
Epoch 5 cost: nan, validation accuacy: 0.0957999974489212
Epoch 6 cost: nan, validation accuacy: 0.0957999974489212
Epoch 7 cost: nan, validation accuacy: 0.0957999974489212
Epoch 8 cost: nan, validation accuacy: 0.0957999974489212
Epoch 9 cost: nan, validation accuacy: 0.0957999974489212
@blackecho There might be some mistakes about the implement, here : I can work well for BBRBM but not for GBRBM
@fanyike yes, the problem is probably the appearance of that NaNs, thanks for pointing it out! @xiaohu2015 I will take a look at your implementation and see if I can solve the issue, thanks!
@blackecho Maybe it is the learning_rate ,which is too big ,making the pre_actiavtion a negative number..
I need a clarification. When training prints
Accuracy: 0.93
does that mean a 7% misclassification? Or 93% misclassification?
Because the run_summaries
says mean error is returned.
@blackecho Without using pretraining, the accuracy on MNIST is 0.92, however, when using pretraining, the accuracy is 0.087, any idea? Thanks
Hi. Has this problem been solved? I have encountered the same problem now.