composer icon indicating copy to clipboard operation
composer copied to clipboard

Autoresume and duration mismatch on reload

Open antoinebrl opened this issue 1 year ago • 12 comments
trafficstars

Description

While experimenting with the autoresume feature, I encountered issues related to duration and the scheduler. My error was providing the duration argument to the .fit() method instead of the max_duration argument to the Trainer constructor. Since the .fit() method can be called multiple times sequentially, each call increases the max_duration, as indicated in the code. Upon resumption, this offset causes an error in the scheduler because t_max becomes smaller than max_duration.

Using max_duration in the Trainer constructor avoids this problem, so I will adopt this approach. Should the scenario described above be detected, and should a warning or error be raised? Essentially, if autoresume=True, then max_duration should be specified in the __init__, and the .fit() method should only be called once in the script.

antoinebrl avatar Jun 04 '24 14:06 antoinebrl