OLMo icon indicating copy to clipboard operation
OLMo copied to clipboard

Why does training not stop after max_duration steps?

Open davidbrandfonbrener opened this issue 11 months ago • 1 comments

❓ The question

I'm not sure if this is intended behavior or a bug, but currently if max_duration is set to a number of steps that is smaller than the entire dataset, then training will not stop after max_duration steps. Is this the intended behavior?

I see that this can be resolved by also setting stop_at in addition to max_duration, but this seems to be more confusing than necessary.

davidbrandfonbrener avatar Feb 29 '24 22:02 davidbrandfonbrener

Hey @davidbrandfonbrener, thanks for bringing that up, it's definitely confusing and I would consider it a bug. At the moment max_duration is only used to set the learning schedule. We'll fix this at some point, or at least clarify in the documentation, but you're right that stop_at is the correct field to use if you want a hard stop in training.

epwalsh avatar Feb 29 '24 23:02 epwalsh

I apologize for our delay in response. In order to help surface current, unresolved issues, we are closing tickets prior to February 29. Please reopen your ticket if you are continuing to experience this issue. Please note that there is another recent bug for the wait max_duration and stopping condition works (#554). Thank you!

dumitrac avatar Apr 30 '24 18:04 dumitrac