Rohan Badlani
Rohan Badlani
A lower learning rate or lower ctc loss weight should work better. Since it is warmstarting from pretrained checkpoint, a lower lr like 1e-4 works fine (with both ctc loss...
If you warmstarted with a very high lr like you were saying you used lr=10e-3, then the attention module can be broken and even removal of prior will not help....
hmmmm...I think I see the problem -- if you look at the gate loss, there is a spike around 575k (which I'm guessing the point when you removed the prior?...
Yes...BUMP on this -- no generator.py in models/
Adding in the implementation for directed graphs as well.
Added Tests in the tests/tests-TUNGraph.cpp and tests/tests-TNGraph.cpp that follow the same convention as all the other tests. These test basic functionality using a small test graph. SUNet Id: **rbadlani**