neat-python icon indicating copy to clipboard operation
neat-python copied to clipboard

How does neat-python ensure the uniqueness of historical markers?

Open sopotc opened this issue 1 year ago • 2 comments

Looking at the connection genes, they do not keep track of any global historical marker.

Apparently the only thing they're keyed on are the start nodes and end nodes. A (start node, end node) key is by no means a guarantee that holds the property of global innovation markers, as they can mean different structures in different genomes. For example node 4 in genome 1 can be in between node 2 and 3, but in genome 2 it can be between 1 and 5. So relying on node ids for speciation and crossover compromises the key property of the NEAT algorithm - the fact that the same historical marker means the same structural mutation across the population.

In other words, if connection gene A is considered to be the same as connection gene B, because their (start, end) tuple is the same, then this is effectively a case of competing conventions which then means the algorithm implemented anything but NEAT. The whole reason NEAT added the historical markers is to avoid exactly this problem.

sopotc avatar Dec 15 '24 10:12 sopotc

See section 3.2 in the original paper. The whole point of the historical marker is to switch the 'coordinates' of a gene from the (start, end) formulation to a global (innovation) coordinate. Thus two genes are the same only when they have the same innovation, and not when they have the same (start, end). Consequently, during crossover they're lined up by their innovation number, and this innovation number is also used during speciation distance calculations.

Why introduce innovation numbers (see fig. 2 and fig. 3 in original paper) when you already have the (in, out) keys?

sopotc avatar Dec 15 '24 11:12 sopotc

There used to be a __global_innov_number in the connection gene, the most recent commit that still has it is here. Then this got moved to a separate, but still global, indexer object (here). Looks like global innovation numbers fell by the wayside in this large refactor.

Edit: I guess this issue can be fixed by the end-user to some extent, by providing a global node_indexer object to the genome. However there would still be the issue that the global innovation numbers are then specified per node, instead of per connection, as you raise in the (very interesting!) TensorNEAT issue linked above.

th555 avatar Dec 21 '24 18:12 th555

@sopotc Thanks for pointing this out! I'm restoring the innovation number tracking for version 1.0.

CodeReclaimers avatar Nov 09 '25 16:11 CodeReclaimers