blocks-examples
blocks-examples copied to clipboard
Sequence models aren't learning
When I run the reverse_words and machine_translation examples, the cost does not decrease, and (in the MT example) the generated samples are still gibberish after 80 epochs. The sqrt example works correctly, which is why I'm suspecting it's to do with the sequence models.
I'm using very recent (yesterday's) git checkouts of blocks-examples, blocks, fuel and theano (installed with pip). Scipy, numpy, etc, are from standard pip install libraries. Using python 2.7 on CPUs; I've replicated this behaviour on two different machines. theano.version '0.7.0.dev-49b554843f47f1b2bc83bb1cbf64dbcbfc70484a'
Is this a known issue?
How long did you train reverse_words
? Did you try to use beam_search
mode?
I trained reverse_words for the default 100 iterations. Character log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's no discernible trend downwards. In the first instance I only ran on one of the billion-word files; I'm now running on the full dataset (using the default fuel wrapper) and so far (iteration 20 or so) I'm seeing the same behaviour - log likelihood fluctuations but no stable decreases. (Clearly each batch will have different costs but I am expecting a general trend downwards, especially at the beginning.)
When I do beam search using the smaller model, I get an error:
$python -m reverse_words beam_search rev_words
[model is loaded]
Enter a sentence
hi
Enter the beam size
3
Encoder input: [42, 7, 8, 43]
Target: [42, 8, 7, 43]
Traceback (most recent call last):
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/__main__.py", line 42, in <module>
main(**vars(args))
File "reverse_words/__init__.py", line 314, in main
batch_size, axis=1))
File "reverse_words/__init__.py", line 274, in generate
ComputationGraph(generated[1]))
ValueError: too many values to unpack
The error in beam search has just been fixed.
The default number of iterations is 10000, not 100, but I guess this is what you ran.
I will try to run it on my machine and will answer you soon.
On 2 October 2015 at 06:25, scfrank [email protected] wrote:
I trained reverse_words for the default 100 iterations. Character log-likelihood fluctuates between 1.7 - 2.5, mostly around 2.0, but there's no discernible trend downwards. In the first instance I only ran on one of the billion-word files; I'm now running on the full dataset (using the default fuel wrapper) and so far (iteration 20 or so) I'm seeing the same behaviour - log likelihood fluctuations but no stable decreases. (Clearly each batch will have different costs but I am expecting a general trend downwards, especially at the beginning.)
When I do beam search using the smaller model, I get an error:
$python -m reverse_words beam_search rev_words [model is loaded] Enter a sentence hi Enter the beam size 3 Encoder input: [42, 7, 8, 43] Target: [42, 8, 7, 43] Traceback (most recent call last): File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/home/sfrank1/.local/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/datastore/home/sfrank1/smt/nmt/blocks-examples/reverse_words/main.py", line 42, in
main(**vars(args)) File "reverse_words/init.py", line 314, in main batch_size, axis=1)) File "reverse_words/init.py", line 274, in generate ComputationGraph(generated[1])) ValueError: too many values to unpack — Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-examples/issues/48#issuecomment-144985802 .
Thanks for looking into this! I can confirm beam search works now. The larger model is now at 2000 iterations and seems to be decreasing the average character log likelihood (from 2.35 at iteration 10 to 1.44 at iteration 2000). So maybe this is not actually a bug and more mistaken expectations on my part, sorry! I was expecting a very quick drop within the first tens of iterations and then a levelling off, whereas it seems to be much more gradual.
It would be nice to have indications of expected behaviour in the readme, but I suppose the default number of iterations is a hint. How long would one have to run the MT example before seeing something semi-reasonable? The config setting is 1000000 iterations; is this an "optimal performance for WMT" setting or a "bare minimum necessary" kind of setting?
FWIW, beam search for reverse_words at this checkpoint (2000) doesn't do very well yet:
Enter a sentence
the sun is shining
Enter the beam size
2
Encoder input: [42, 19, 7, 4, 41, 18, 20, 13, 41, 8, 18, 41, 18, 7, 8, 13, 8, 13, 6, 43]
Target: [42, 4, 7, 19, 41, 13, 20, 18, 41, 18, 8, 41, 6, 13, 8, 13, 8, 7, 18, 43]
(66.2302337189)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si seh
(66.0378270722)<S>eht ssisisis si sehtis si sehtis si sehtis si sehtis si sis