Attention-OCR icon indicating copy to clipboard operation
Attention-OCR copied to clipboard

Training seems broken from tf 11 and keras 1.1.1

Open gorinars opened this issue 8 years ago • 5 comments

I was checking the last version that should support tensorflow 11 and keras 1.1.1 that I have in my system

I run on full synth90 corpus the sample training (without GRU option)

python src/launcher.py \
	--phase=train \
	--data-path=90kDICT32px/new_annotation_train.txt \
	--data-base-dir=90kDICT32px \
	--log-path=log_sy.txt \
	--attn-num-hidden 256 \
	--batch-size 32 \
	--model-dir=model_x \
	--initial-learning-rate=1.0 \
	--num-epoch=20000 \
	--gpu-id=0 \
--target-embedding-size=10

It seem to converge better than the old version. After 70k iterations I get pretty nice perplexity

2016-12-08 13:12:07,393 root  INFO     current_step: 78998
2016-12-08 13:12:07,589 root  INFO     step_time: 0.196019, step_loss: 0.080441, step perplexity: 1.083765
2016-12-08 13:12:08,620 root  INFO     current_step: 78999
2016-12-08 13:12:08,884 root  INFO     step_time: 0.264444, step_loss: 0.071078, step perplexity: 1.073665
2016-12-08 13:12:08,885 root  INFO     global step 79000 step-time 0.26 loss 0.094580  perplexity 1.10
2016-12-08 13:12:08,885 root  INFO     Saving model, current_step: 79000

However, when testing on SVT, I get a very bad performance

2016-12-08 13:35:25,947 root  INFO     step_time: 0.049570, loss: 3.750340, step perplexity: 42.535551
2016-12-08 13:35:25,950 root  INFO     62.324373 out of 647 correct

Interestingly, the model provided by the authors works great with --old-model option. So it is not the decoder bug but rather some problem in training that does not show up in training log but does affect the final performance.

I wonder if someone tried to traing/test the new version using tensorflow 11 and keras 1.1.1 ? Thanks

gorinars avatar Dec 08 '16 10:12 gorinars

I got the same problem just like yours. The training converge much faster than old version but the performance is extremely bad.

Something different: the model provided by the authors works is not so great with --old-model-version option. Here is the result(without Distance library):

2016-12-22 15:13:24,881 root  INFO     200.000000 out of 645 correct
2016-12-22 15:13:24,943 root  INFO     step_time: 0.059276, loss: 1.567141, step perplexity: 4.792924
2016-12-22 15:13:24,959 root  INFO     200.000000 out of 646 correct
2016-12-22 15:13:25,029 root  INFO     step_time: 0.067385, loss: 0.174794, step perplexity: 1.191001
2016-12-22 15:13:25,035 root  INFO     201.000000 out of 647 correct

flymark2010 avatar Dec 22 '16 07:12 flymark2010

Which result do you have with distance library? Can you confirm you use keras 1.1.1?

gorinars avatar Dec 22 '16 07:12 gorinars

I use keras 1.1.1 and tensorflow 0.10.0. I didn't install Distance library in my environment, so all the results are without Distance library, and I think this library do not affect the result much.

flymark2010 avatar Dec 22 '16 07:12 flymark2010

Usually it does. Without it you compute word accuracy, otherwise its more related to character accuracy

gorinars avatar Dec 22 '16 08:12 gorinars

how to install the distance module?

balajiwix avatar May 20 '18 07:05 balajiwix