pyMeta
pyMeta copied to clipboard
Have you tried to use the best CG approximation among iterations?
Since Hessian is indefinite for CNNs, CG cannot guarantee the monotonic decrease of errors. So, have you ever tried to use the best step's results instead of the last? In my experiments, the residual errors of r^Tr vary along with the iterations.
Nope, but I have used simple alternatives like plain gradient descent and steepest descent, which however is also affected by negative alphas, although it performs better than CG in my tests.
I agree that CG does not always work, with some step sizes becoming negative. I am not sure how/if the problem has been addressed in the original paper, but I am open to suggestions and to brainstorm the issue. :)