AdvancedEAST icon indicating copy to clipboard operation
AdvancedEAST copied to clipboard

Training details about different sizes

Open LucyLu-LX opened this issue 6 years ago • 4 comments

python preprocess.py, resize image to 256256,384384,512512,640640,736*736, and train respectively could speed up training process.

I am kinda confused the meaning of train respectively? Does this mean to train the network in a coarse-to-fine process, which initals the network from 256x256 and then finetunes it on larger sizes? Does this accelerate the converge of the network than train it on size 736x736 directly?

LucyLu-LX avatar Sep 03 '18 07:09 LucyLu-LX

1.Does this mean to train the network in a coarse-to-fine process, which initals the network from 256x256 and then finetunes it on larger sizes? Yes 2.Does this accelerate the converge of the network than train it on size 736x736 directly? Yes, because direct training on size 736 is very slow

my own training method: set cfg.train_task_id = '2T256' set patience 5(between 2~6) python preprocess.py && python label.py && python advanced_east.py

when end of training, copy the best saved weights file(.h5) to initials the training of size 384, modify cfg.train_task_id = '2T384' and cfg.initial_epoch="the ending epoch" and cfg.load_weights=True and continue train.

then train 512 and so on. You could try this method, maybe there are better ways.

huoyijie avatar Sep 04 '18 04:09 huoyijie

@huoyijie whether the network still remembers what they learned in 256 while training 736?

hcnhatnam avatar Mar 15 '19 05:03 hcnhatnam

Hi, I download tianchi ICPR dataset,set cfg.train_task_id = '3T256',run python3 preprocess.py && python3 label.py && python3 advanced_east.py. But I get this error. The output information is shown below. How can I fix this error? Can you help me?@LucyLu-LX @huoyijie @hcnhatnam

Epoch 00008: val_loss improved from 0.43569 to 0.42750, saving model to model/weights_3T256.008-0.427.h5 Epoch 9/24 1125/1125 [==============================] - 157s 139ms/step - loss: 0.2762 - val_loss: 0.4373

Epoch 00009: val_loss did not improve from 0.42750 Epoch 10/24 1125/1125 [==============================] - 156s 139ms/step - loss: 0.2579 - val_loss: 0.4435

Epoch 00010: val_loss did not improve from 0.42750 Epoch 11/24 1125/1125 [==============================] - 156s 139ms/step - loss: 0.2466 - val_loss: 0.4710

Epoch 00011: val_loss did not improve from 0.42750 Epoch 12/24 1125/1125 [==============================] - 156s 139ms/step - loss: 0.2342 - val_loss: 0.4633

Epoch 00012: val_loss did not improve from 0.42750 Epoch 13/24 1125/1125 [==============================] - 156s 139ms/step - loss: 0.2228 - val_loss: 0.4724

Epoch 00013: val_loss did not improve from 0.42750 Epoch 00013: early stopping

globalmaster avatar Mar 19 '19 04:03 globalmaster

@globalmaster it isn't error. The training is stopped early(early stopping) to avoid overfit. Looks like the model is not converging and this is still my problem. @globalmaster, Can you share for me dataset with google driver link?

hcnhatnam avatar Mar 20 '19 10:03 hcnhatnam