Attention-OCR
Attention-OCR copied to clipboard
Training seems broken from tf 11 and keras 1.1.1
I was checking the last version that should support tensorflow 11 and keras 1.1.1 that I have in my system
I run on full synth90 corpus the sample training (without GRU option)
python src/launcher.py \
--phase=train \
--data-path=90kDICT32px/new_annotation_train.txt \
--data-base-dir=90kDICT32px \
--log-path=log_sy.txt \
--attn-num-hidden 256 \
--batch-size 32 \
--model-dir=model_x \
--initial-learning-rate=1.0 \
--num-epoch=20000 \
--gpu-id=0 \
--target-embedding-size=10
It seem to converge better than the old version. After 70k iterations I get pretty nice perplexity
2016-12-08 13:12:07,393 root INFO current_step: 78998
2016-12-08 13:12:07,589 root INFO step_time: 0.196019, step_loss: 0.080441, step perplexity: 1.083765
2016-12-08 13:12:08,620 root INFO current_step: 78999
2016-12-08 13:12:08,884 root INFO step_time: 0.264444, step_loss: 0.071078, step perplexity: 1.073665
2016-12-08 13:12:08,885 root INFO global step 79000 step-time 0.26 loss 0.094580 perplexity 1.10
2016-12-08 13:12:08,885 root INFO Saving model, current_step: 79000
However, when testing on SVT, I get a very bad performance
2016-12-08 13:35:25,947 root INFO step_time: 0.049570, loss: 3.750340, step perplexity: 42.535551
2016-12-08 13:35:25,950 root INFO 62.324373 out of 647 correct
Interestingly, the model provided by the authors works great with --old-model option. So it is not the decoder bug but rather some problem in training that does not show up in training log but does affect the final performance.
I wonder if someone tried to traing/test the new version using tensorflow 11 and keras 1.1.1 ? Thanks
I got the same problem just like yours. The training converge much faster than old version but the performance is extremely bad.
Something different: the model provided by the authors works is not so great with --old-model-version
option. Here is the result(without Distance library):
2016-12-22 15:13:24,881 root INFO 200.000000 out of 645 correct
2016-12-22 15:13:24,943 root INFO step_time: 0.059276, loss: 1.567141, step perplexity: 4.792924
2016-12-22 15:13:24,959 root INFO 200.000000 out of 646 correct
2016-12-22 15:13:25,029 root INFO step_time: 0.067385, loss: 0.174794, step perplexity: 1.191001
2016-12-22 15:13:25,035 root INFO 201.000000 out of 647 correct
Which result do you have with distance library? Can you confirm you use keras 1.1.1?
I use keras 1.1.1 and tensorflow 0.10.0. I didn't install Distance library in my environment, so all the results are without Distance library, and I think this library do not affect the result much.
Usually it does. Without it you compute word accuracy, otherwise its more related to character accuracy
how to install the distance module?