neat-python
neat-python copied to clipboard
Checkpoint restores second to last generation
Hello,
If for example I set the checkpointer to save every 5 generations, the checkpoint is made at the end of generation 4, and then generation 5 starts.
If I restore checkpoint-4, generation 4 starts again, and a checkpoint-4 is made again.
Is this by design?
Because I am now running a fairly big population which takes about an hour for the generation to complete. The checkpoints are every generation. But every time I interrupt and restore, I am losing a generation's worth of progress.
I have made a dirty fix for this: to make a checkpoint at the start of the generation instead of the end. It's not ideal, because after moving the contents of "end_generation" function to "start_generation" I had to add all the parameters to the "start_generation" function in reporting.py and population.py.
But otherwise it works:
" ****** Running generation 4 ******
Saving checkpoint to neat-checkpoint-4"
I am sure you will be able to implement a more elegant solution if, of course, this fix is warranted.
Thanks!
@vladas-v thanks for this "feature" I am having the same issue and your change really helped!
How exactly did you make it save every five generations, when I restore from another generation, it wont save any checkpoints.
I found this same problem trying out with saving after each generation and running it for one generation.. it always starts at generation 0 and saves generation 0? Why is this?
Simple fix for me was to add "generation += 1" before the return statement in the restore_checkpoint function in checkpoint.py. It just wasn't incrementing the generation count when loading a checkpoint.
Simple fix for me was to add "generation += 1" before the return statement in the restore_checkpoint function in checkpoint.py. It just wasn't incrementing the generation count when loading a checkpoint.
Are you adding generation += 1 at the end of def restore_checkpoint(filename): ? Because for me this doesn't change anything at all..
Yes, mine is like:
generation += 1
return Population(config, (population, species_set, generation))
Weird, for me this doesn't change a thing. If it is saved as generation 3, it will start as a checkpoint generation 3 all over again even with generation += 1... So for you if you set winner = p.run(RunTraining, 1) (only train for one cycle) the checkpoint can start the next generation? For me this only makes it stay in the same generation.. Thanks anyway for the response man.
I wonder if the file you edited isn't what's being run. If you put a 1/0 before the return statement, does it give a divide by zero error when restoring a checkpoint? Are you using Anaconda? Inside the Anaconda/Miniconda directory, the file should be Lib\site-packages\neat\checkpoint.py
You were right man, thanks. The file I needed to change was in the neat library. I changed it to generation += 1, it does continue from the correct generation. Not sure if it's just a change in number or if it really continues where it finished. I'll give an update once I know for sure. Thanks man.