Docs Clarification, Universal Elitism, Exceeding Population Size
Some of the documentation seemed a little unclear to me regarding the mutation rates. For example...
weight_mutate_rate - The probability that mutation will change the weight of a connection by adding a random value.
Does the number I enter for this represent the probability that each connection will undergo mutation or the probability that a connection in the net will undergo mutation? The same thing applies with all other mutate_rate config attributes.
In addition, is there an easy way of incorporating universal elitism (meaning the x most fit genomes will survive regardless of species)?
I'm also encountering an issue where sometimes my genome evaluation function is passed more genomes than the population size set in the config. Initially, I thought the problem was due to the fact that the elitism was set to a number higher than the minimum species size but the problem still arises after fixing this.
From reading the code I've gathered the following:
Mutation rates
The mutation rates, e.g. weight_mutate_rate, apply to every instance of a thing that has that property. For instance, with connection weights, every connection in the genome will individually be mutated with that probability. If, for example, you were to set the weight_mutate_rate to 1.0 then every connection's weights would change every generation.
You can see in on line 184 here that every child is mutated: https://github.com/CodeReclaimers/neat-python/blob/15e910ce12f34497b32946e468205e08b019034d/neat/reproduction.py#L172-L186.
If you then go the definition of the mutate method you'll see that each node and each connection is individually mutated: https://github.com/CodeReclaimers/neat-python/blob/15e910ce12f34497b32946e468205e08b019034d/neat/genome.py#L267-L303
And if you follow the code all the way to what it means for a gene to be mutated you'll see that each of the attributes are also individually mutated: https://github.com/CodeReclaimers/neat-python/blob/master/neat/genes.py#L48-L51.
Population sizes
I'm not exactly sure yet why you can start off with more than the configured population number of genomes but I've gathered that the reason you have populations larger than configured after that is because of the preservation of species. Basically, when a child is being created, there isn't a way to know whether it will be a member of the same species as it's parents, form a completely new species or "join" another species. It seems that what happens is that children are produced until each species has the config::min_species_size.
In the case of the first generation, because there isn't really a control that limits the number of species you start with, it leads to having a minimum of len(species) * config_min_species_size members in your population. Many of these will be stagnant though so the population size should be closer to the expected value just after config::max_stagnation generations.
In generations that come after that happens, you'll find that you can have a few more or a few less than your population. This happens because the number of members each species is allowed to have is proportional to its relative fitness (fitter species are bigger). However, because that proportion doesn't cleanly translate to integral numbers, you tend to get a little more or a little less but not quite your population size. On average you should have your population size though.
You can read the source code in https://github.com/CodeReclaimers/neat-python/blob/master/neat/reproduction.py to get a clearer picture of the rules that govern this. That class is used to initialize a population as well as to produce new members for each generation.