DeepTextSpotter
DeepTextSpotter copied to clipboard
TEST and TRAIN phases
Hey, What are the differences between those 2 phases? I noticed that TEST phase disables the dropout and cropping, and enables a softmax layer that allows to extract ctc probabilities. Are there any more differences? I assume there are, as I get significantly (15%) worse results when using TEST phase which is weird. I'm using a modified validation script I wrote from train.py, because I encountered many problems using validation / demo files (pre-processing is different from train.py) Many thanks, Lee
Hi,
TRAIN phase is for training, means:
- drop-out enabled (one of possible views is that you are sampling your network, so you are learning many networks within one network)
- batch normalization is training mode (so it computes mean and variance from data)
in TEST phase,
- drop-out is ignored, so you are using full network
- batch normalization is testing mode (so it uses learned/(averaged) mean and variance)
so when you are using batch norm. or drop-out, the accuracy should be worse in TRAIN phase.
if you are getting worse results in TEST phase, it can be caused by meta-parameters (batch size / batch-norm eps and momentum)
Michal
On 03/13/2018 01:22 PM, leetwito wrote:
Hey, What are the differences between those 2 phases? I noticed that TEST phase disables the dropout and cropping, and enables a softmax layer that allows to extract ctc probabilities. Are there any more differences? I assume there are, as I get significantly (15%) worse results when using TEST phase which is weird. I'm using a modified validation script I wrote from train.py, because I encountered many problems using validation / demo files (pre-processing is different from train.py) Many thanks, Lee
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsBH6lJeJ70tgZvJ7JCoZOPjPgR2Vks5td7oSgaJpZM4Solik.
A general way to look at dropout is that you are randomly dropping a fixed percentage(=dropout) of input edges to every node in the network. Anyone knows how this avoids overfitting?
Original paper is really good: http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
it is worth reading.
Note: dropout does not avoid overfitting, it just makes it harder.
thanks @MichalBusta, the abstract said it all, I will through the paper.. I notice that you are calling dropout and BatchNorm in model_cz.prototxt Are they not needed in tiny.prototxt as well, especially dropout?
Also @leetwito , where do you pass the PHASE while retraining? I do not see any option in train.py. I am calling create_solvers_tiny() from models.py which I modified as follows:-
def create_tiny_yolo_solver():
solver = caffe.get_solver('models/tiny_solver.prototxt') #solver.restore('backup/yolo_mobile_iter_357000.solverstate') solver.net.copy_from('/home/ayush/DeepTextSpotter/models/tiny.caffemodel') return solver
def create_recognizer_solver(): solver = caffe.get_solver('models/solver_ctc.prototxt') #solver.restore('backup/recog_iter_195000.solverstate') solver.net.copy_from('/home/ayush/DeepTextSpotter/models/model.caffemodel') return solver
def create_solvers_tiny():
proposal_net = create_tiny_yolo_solver() recog = create_recognizer_solver()
return proposal_net, recog