Angora
Angora copied to clipboard
Look into alternatives to gradient descent
Section 3.4 of the paper describes using gradient descent with a 2-point method for computing the gradient vector. This basically involves doing O(d) function calls to find the approximate gradient then doing a small number of calls to move in that direction. There is a long history of Derivative free Optimization Methods, that try to make each function evaluation do some of both. For example some methods keep an approximation to the function shape and try calling the function on the minimum of the approximation, this new result is then used to make a better approximation.
I would start by looking into:
- Nelder–Mead, keep the
dbest function calls. Use a best fit plane through them as a approximate gradient. - NEWUOA witch has a clever way to approximate the polynomial best fit to the previously evaluated points.
- BFGS, is not what I would have guessed, as it uses the gradient, but appears to be the default in scipy.optimize. Mabey look at the source code to figure out how it is approximating the gradient.
Great! You can see the angora/fuzzer/src/search/gd.rs code as an example. Don't hesitate to ask me if you have any problem.
That is going to be the topic of my BA thesis. I have a few more ideas on this, I'll post the results when I'm done :)
@greshake How is your thesis going? Did you find anything that worked? Did you try things that turned out not to pay off?