Rohan Badlani

Results 6 comments of Rohan Badlani

A lower learning rate or lower ctc loss weight should work better. Since it is warmstarting from pretrained checkpoint, a lower lr like 1e-4 works fine (with both ctc loss...

If you warmstarted with a very high lr like you were saying you used lr=10e-3, then the attention module can be broken and even removal of prior will not help....

hmmmm...I think I see the problem -- if you look at the gate loss, there is a spike around 575k (which I'm guessing the point when you removed the prior?...

Yes...BUMP on this -- no generator.py in models/

Adding in the implementation for directed graphs as well.

Added Tests in the tests/tests-TUNGraph.cpp and tests/tests-TNGraph.cpp that follow the same convention as all the other tests. These test basic functionality using a small test graph. SUNet Id: **rbadlani**