gplearn
gplearn copied to clipboard
Add different selection methods
- Roulette wheel selection
- Others?
Epsilon Lexicase
Be great if you could provide a bit more detail @echo66 ...
It's already implement in https://github.com/lacava/few/ .
More info at:
- https://github.com/lacava/emo-lex
- https://github.com/lacava/epsilon_lexicase
- http://williamlacava.com/research/lexicase
I have privately made my own additions to gplearn which includes ParetoGP and EPLEX. See also #33
I would be more than happy to build a PR from my work. It will need some time though since I am currently in the process of finishing up my thesis.
This would be most welcome @wulfihm :+1:
Beginning this now. Not sure when I will be done.
No worries @wulfihm take your time, no rush. I'll be interested to see what you come up with. Abstracting out the selector as its own class possible?
So, as evident I did not get around to doing this. I do not know if I will ever have the motivation to do it at all as writing it in the first place took some time already. Essentially what I already did was implement ParetoGP and Eplex in my own fork of gplearn (which I created for my bachelor thesis), but which also includes other adaptions I made, which is why a clean PR from this fork is currently not possible. Also I am not sure if my implementations there are programmatically satisfactory for you at all as I did not intend the code to be very well maintainable by other people at all. I didn't even expect anyone to read it in the first place. See: https://github.com/wulfihm/gplearn_ba
Maybe I will get around to doing a clean PR for this but maybe also not, so in the meantime if anyone wants to do it, feel free. And if you have any questions to my code above I am able to answer them if necessary.
Eplex and ParetoGP are defined here: https://github.com/wulfihm/gplearn_ba/blob/master/gplearn/selection.py While the ParetoFront is created here: https://github.com/wulfihm/gplearn_ba/blob/master/gplearn/genetic.py
I tried doing NSGA2 but it really did not work at all.
I also implemented other stuff like geometric semantic crossover and mutation, simplification of solutions of gplearn (not finished) and another complexity measure I named 'Kommenda' from M. Kommenda et al. and adding the R2 score for regression. I also changed the math operators to be more "precise".
Another note, everything above I only implemented with regression in mind. I completely ignored the symbolicTransformer.
If you are interested at all here is my bachelor thesis: https://www.researchgate.net/publication/335842681_Genetic_Programming_for_Automotive_Modeling_Applications
Really appreciate you sharing this @wulfihm , if someone wants to take up the torch, it'd be very cool to see these added. Otherwise, maybe I'll take a few rainy weekend days this winter to play with your code :smile:
+1 for this! @wulfihm do you have any docs on how to use those techniques in your code? Or even a jupyter notebook with a simple example maybe? Anything helps.
I'm trying to switch to GPLearn from Eureqa lately and I'm also very interested in this. For context, I have a recent paper on converting neural networks into analytic equations to discover new physical laws: https://arxiv.org/abs/2006.11287.
We use the following Pareto front technique where we look for the sharpest drop in log-error over length. It seems to work pretty well in a range of noisy datasets rather than jointly optimizing loss and length. But I'd also be interested in trying out these others.
@MilesCranmer What exactly do you mean with "how to use these techniques"? Programmatically, Theoretically? :D
I mean programmatically - i.e., how I can configure those methods for GPlearn's .fit() loop for a particular problem if I were to use your fork.
I added additional hyperparameters/options:
complexity => 'kommenda' (for the kommenda complexity) selection => 'eplex' paretogp => 'True' or 'False' paretogp_lengths => (a, b)
ParetoGP works by selecting the first parent randomly from the Paretofront (The Archive). The second parent is selection via the selection mechanism (can be anything, i.e. tournament or eplex) from the normal population. See: https://doi.org/10.1007/0-387-23254-0_17
paretogp_lengths
is to limit the size of the solutions in the archive, since there is no penalty parameter anymore the individuals could be infinitely large. paretogp_lengths = (5,250)
seems large enough to me. Keep the lower limit above 3 or 4, or else it may cause issues.
I used the code here: https://github.com/wulfihm/ba_code/blob/master/main.py works via command line arguments.
The elitism_size
command could be interesting to you if you use no ParetoGP. The original GPlearn has the possibility that your population gets worse, since it does not retain the previous generation i.e. the next generation replaces the old one. Elitism also is only in effect if ParetoGP disabled.
That's awesome! I'm really looking forward to trying it out this week.
Thanks for putting this online and offering assistance in configuring it.
Cheers, Miles