char-rnn
char-rnn copied to clipboard
Should init_from parameter start train.lua from 1 if multiple checkpoints exist?
When the process restarted, it started counting up from 1. Is init_from not used as a "pause" kind of parameter in the event I need to shut down my box between trainings? Or is the counting just incorrect?
- ran train.lua for a couple hours to generate 10 checkpoints
- stopped process
- ran train_lua with
init_fromparameter pointing to the latest checkpoint file
I did see #33, but it doesn't specifically mention display count.
Hi @davidlfox , what you're observing is the current intended behavior, hence the name init_from rather than resume_from. The issue is that resuming precisely would be a bit tricky (e.g. state of the optimizer would have to be saved in each checkpoint too), and also one might not necessarily want this.
Hmm, not sure about this.
Do you have a strong use case for exactly resuming?
EDIT: I agree that this should probably exist. Thinking about the API.
I don't know if it's strong, but my use case is just as I described: powering off a box in the middle of a long training.
+1 for having a resume_from feature (I have the same use-case: powering off a box in the middle of training).
A flag to enable the full saving of state might be the best route.