DeepTextSpotter icon indicating copy to clipboard operation
DeepTextSpotter copied to clipboard

TEST and TRAIN phases

Open leetwito opened this issue 6 years ago • 4 comments

Hey, What are the differences between those 2 phases? I noticed that TEST phase disables the dropout and cropping, and enables a softmax layer that allows to extract ctc probabilities. Are there any more differences? I assume there are, as I get significantly (15%) worse results when using TEST phase which is weird. I'm using a modified validation script I wrote from train.py, because I encountered many problems using validation / demo files (pre-processing is different from train.py) Many thanks, Lee

leetwito avatar Mar 13 '18 12:03 leetwito

Hi,

TRAIN phase is for training, means:

    - drop-out enabled (one of possible views is that you are sampling your network, so you are learning many networks within one network)

    - batch normalization is training mode (so it computes mean and variance from data)

in TEST phase,

  - drop-out is ignored, so you are using full network

  - batch normalization is testing mode (so it uses learned/(averaged) mean and variance)

so when you are using batch norm. or drop-out, the accuracy should be worse in TRAIN phase.

if you are getting worse results in TEST phase, it can be caused by meta-parameters (batch size / batch-norm eps and momentum)

Michal

On 03/13/2018 01:22 PM, leetwito wrote:

Hey, What are the differences between those 2 phases? I noticed that TEST phase disables the dropout and cropping, and enables a softmax layer that allows to extract ctc probabilities. Are there any more differences? I assume there are, as I get significantly (15%) worse results when using TEST phase which is weird. I'm using a modified validation script I wrote from train.py, because I encountered many problems using validation / demo files (pre-processing is different from train.py) Many thanks, Lee

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/DeepTextSpotter/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsBH6lJeJ70tgZvJ7JCoZOPjPgR2Vks5td7oSgaJpZM4Solik.

MichalBusta avatar Mar 13 '18 12:03 MichalBusta

A general way to look at dropout is that you are randomly dropping a fixed percentage(=dropout) of input edges to every node in the network. Anyone knows how this avoids overfitting?

rohitsaluja22 avatar Mar 17 '18 08:03 rohitsaluja22

Original paper is really good: http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

it is worth reading.

Note: dropout does not avoid overfitting, it just makes it harder.

MichalBusta avatar Mar 17 '18 09:03 MichalBusta

thanks @MichalBusta, the abstract said it all, I will through the paper.. I notice that you are calling dropout and BatchNorm in model_cz.prototxt Are they not needed in tiny.prototxt as well, especially dropout?

Also @leetwito , where do you pass the PHASE while retraining? I do not see any option in train.py. I am calling create_solvers_tiny() from models.py which I modified as follows:-

def create_tiny_yolo_solver():

solver = caffe.get_solver('models/tiny_solver.prototxt') #solver.restore('backup/yolo_mobile_iter_357000.solverstate') solver.net.copy_from('/home/ayush/DeepTextSpotter/models/tiny.caffemodel') return solver

def create_recognizer_solver(): solver = caffe.get_solver('models/solver_ctc.prototxt') #solver.restore('backup/recog_iter_195000.solverstate') solver.net.copy_from('/home/ayush/DeepTextSpotter/models/model.caffemodel') return solver

def create_solvers_tiny():

proposal_net = create_tiny_yolo_solver() recog = create_recognizer_solver()

return proposal_net, recog

rohitsaluja22 avatar Mar 17 '18 09:03 rohitsaluja22