sempre
sempre copied to clipboard
Problems when update params
Hi, Liang I used Sempre1.0 paraphrase for test, and I only used the alignment and VSM features, no paraphrase features was adoped, such as "Denotation Features", "Formula Features", "Wh- type " "NER"
When I train the model I find that the generated param file was too small , So I tried to comment these code, in Params.java , function: private void clipUpdate(String f, double update) comments weights.remove(f); if(currWeight*(currWeight+update)<0.0) { //weights.remove(f); }
The result is: the generated param file was large(2.6M), and the precision was improved significantly!! So ,do I over fit the model when I do this? are there any other negative effect I didn't noticed?
Hi, What we have is an implementation of stochastic gradient descent with l1 regularization. We clip to zero whenever the weight changes sign as described in this paper: http://www.aclweb.org/anthology/P09-1054.
Commenting that out for sure will increase the number of parameters since you lost the effect of L1 regularization. I don't know why precision goes up, as the update is wrong, but if it happens in training and not dev/test then yes that is over-fitting
On Tue, May 19, 2015 at 2:58 AM, uwittygit [email protected] wrote:
Hi, Liang I used Sempre1.0 paraphrase for test, and I only used the alignment and VSM features, no paraphrase features was adoped, such as "Denotation Features", "Formula Features", "Wh- type " "NER"
When I train the model I find that the generated param file was too small , So I tried to comment these code, in Params.java , function: private void clipUpdate(String f, double update) comments weights.remove(f); if(currWeight*(currWeight+update)<0.0) { //weights.remove(f); }
The result is: the generated param file was large(2.6M), and the precision was improved significantly!! So ,do I over fit the model when I do this? are there any other negative effect I didn't noticed?
— Reply to this email directly or view it on GitHub https://github.com/percyliang/sempre/issues/48.