neat-python icon indicating copy to clipboard operation
neat-python copied to clipboard

Checkpoint restores second to last generation

Open vladas-v opened this issue 7 years ago • 9 comments

Hello,

If for example I set the checkpointer to save every 5 generations, the checkpoint is made at the end of generation 4, and then generation 5 starts.

If I restore checkpoint-4, generation 4 starts again, and a checkpoint-4 is made again.

Is this by design?

Because I am now running a fairly big population which takes about an hour for the generation to complete. The checkpoints are every generation. But every time I interrupt and restore, I am losing a generation's worth of progress.

I have made a dirty fix for this: to make a checkpoint at the start of the generation instead of the end. It's not ideal, because after moving the contents of "end_generation" function to "start_generation" I had to add all the parameters to the "start_generation" function in reporting.py and population.py.

But otherwise it works:

" ****** Running generation 4 ******

Saving checkpoint to neat-checkpoint-4"

I am sure you will be able to implement a more elegant solution if, of course, this fix is warranted.

Thanks!

vladas-v avatar Jun 08 '18 06:06 vladas-v

@vladas-v thanks for this "feature" I am having the same issue and your change really helped!

richban avatar Apr 13 '19 17:04 richban

How exactly did you make it save every five generations, when I restore from another generation, it wont save any checkpoints.

MooButter avatar Jun 23 '19 04:06 MooButter

I found this same problem trying out with saving after each generation and running it for one generation.. it always starts at generation 0 and saves generation 0? Why is this?

iglonator avatar Sep 26 '20 14:09 iglonator

Simple fix for me was to add "generation += 1" before the return statement in the restore_checkpoint function in checkpoint.py. It just wasn't incrementing the generation count when loading a checkpoint.

nerevar009 avatar Mar 12 '21 23:03 nerevar009

Simple fix for me was to add "generation += 1" before the return statement in the restore_checkpoint function in checkpoint.py. It just wasn't incrementing the generation count when loading a checkpoint.

Are you adding generation += 1 at the end of def restore_checkpoint(filename): ? Because for me this doesn't change anything at all..

iglonator avatar Apr 20 '21 15:04 iglonator

Yes, mine is like: generation += 1 return Population(config, (population, species_set, generation))

nerevar009 avatar Apr 20 '21 16:04 nerevar009

Weird, for me this doesn't change a thing. If it is saved as generation 3, it will start as a checkpoint generation 3 all over again even with generation += 1... So for you if you set winner = p.run(RunTraining, 1) (only train for one cycle) the checkpoint can start the next generation? For me this only makes it stay in the same generation.. Thanks anyway for the response man.

iglonator avatar Apr 29 '21 09:04 iglonator

I wonder if the file you edited isn't what's being run. If you put a 1/0 before the return statement, does it give a divide by zero error when restoring a checkpoint? Are you using Anaconda? Inside the Anaconda/Miniconda directory, the file should be Lib\site-packages\neat\checkpoint.py

nerevar009 avatar Apr 30 '21 10:04 nerevar009

You were right man, thanks. The file I needed to change was in the neat library. I changed it to generation += 1, it does continue from the correct generation. Not sure if it's just a change in number or if it really continues where it finished. I'll give an update once I know for sure. Thanks man.

iglonator avatar May 16 '21 12:05 iglonator