catalyst
catalyst copied to clipboard
Add support for step based validation and checkpointing
🚀 Feature Request
The current library is built around epochs for a lot of core functionality including the checkpoint callback and validation metric functionality. It would be very useful to allow people to use steps instead of epochs to validate and save checkpoints.
Motivation
For large datasets saving every epoch is pretty useless. Additionally, even for smaller datasets being able to run validation metrics more frequently can be very helpful to see how the model is performing.
Proposal
For all core functionality that currently only uses epochs add a mode parameter which can be either epoch or step. If step is selected there will be another parameter num_steps (or something similarly named) which will control how many steps between the given functionality (e.g. steps between validation runs). Though I think this should be done for all epoch tied features the two most pressing are validation runs and checkpoints.
Alternatives
It is likely possible to add this functionality with custom callbacks, but this seems like a lot of work especially given how common this request likely is.
Additional context
Checklist
- [x] feature proposal description
- [x] motivation
- [x] extra proposal context / proposal alternatives review
FAQ
Please review the FAQ before submitting an issue:
- [x] I have read the documentation and FAQ
- [x] I have reviewed the minimal examples section
- [x] I have checked the changelog for main framework updates
- [x] I have read the contribution guide
- [x] I have joined Catalyst slack (#__questions channel) for issue discussion
Hi! Thank you for your contribution! Please re-check all issue template checklists - unfilled issues would be closed automatically. And do not forget to join our slack for collaboration.
Hi,
Is it possible to use Sampler with train Dataset (Dataloder(dataset=dataset, sampler=sampler)
) to make it the pre-defined number of batches you want?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.