text2sql-lgesql
text2sql-lgesql copied to clipboard
The program crashes when I use the argument "--load_optimizer"
I wanted to continue training the model with the saved optimizer, but it crashed. The traceback is shown as follows:
Traceback (most recent call last):
File "lgesql/text2sql.py", line 105, in
Have you met this problem? And how can I fix it?
More information:
When I comment the code "optimizer.load_state_dict(check_point['optim'])", the program will not crash but the training loss will be much larger than the loss in the last epoch of the saved model.
Thanks a lot for pointing out this bug.
We also find this problem when loading from checkpoints. Honestly, we never used this interface for training from checkpoints in our experiments and neglected this bug by accident. The problem is caused by mismatches about key-value pairs in self.state
of the optimizer. And the cause is that the set()
operations over the parameters in function set_optimizer
lead to different orders when invoked in different runs. Thus, the self.state
mappings in load_state_dict
for the optimizer fails. (See load_state_dict in Pytorch Optimizer for more details)
We have fixed this bug by removing all set()
operations in function set_optimizer
in utils/optimization.py
. And everything seems ok if you now train from scratch and load from checkpoints.
Thanks again for pointing out this problem.