composer
composer copied to clipboard
Autoresume and duration mismatch on reload
Description
While experimenting with the autoresume feature, I encountered issues related to duration and the scheduler. My error was providing the duration argument to the .fit() method instead of the max_duration argument to the Trainer constructor. Since the .fit() method can be called multiple times sequentially, each call increases the max_duration, as indicated in the code. Upon resumption, this offset causes an error in the scheduler because t_max becomes smaller than max_duration.
Using max_duration in the Trainer constructor avoids this problem, so I will adopt this approach. Should the scenario described above be detected, and should a warning or error be raised? Essentially, if autoresume=True, then max_duration should be specified in the __init__, and the .fit() method should only be called once in the script.