physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🚀[FEA]: allow control over checkpoint filename format in `save_checkoint`

Open CharlelieLrt opened this issue 2 months ago • 0 comments

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Critical (currently preventing usage)

Please provide a clear description of problem you would like to solve.

As reported by @nbren12 ,save_checkpoint(model=model, epoch=epoch, ...) will save the checkpoints file under the format <model_name>.<epoch>.mdlus, where <model_name> is inferred from the Module metadata. The resulting files are named for example: <model_name>.1.mdlus for epoch=1, <model_name>.10.mdlus for epoch=10, <model_name>.100.mdlus for epoch=100, etc.

Problem

The function save_checkpoint does not offer any control on the format of the checkpoint file name, for instance some users might require a format with 0-padded epoch number: <model_name>.000001.mdlus, etc.

Solution

Allow to pass an optional argument format to save_checkpoint, for example: save_checkpoint(model=model, epoch=epoch, format="{name}.{epoch:06d}") to set 0-padded epoch number, or save_checkpoint(model=model, epoch=epoch, format="{name}_{epoch:06d}") to modify the separator in the file anme.

Describe any alternatives you have considered

No response

CharlelieLrt avatar Oct 23 '25 22:10 CharlelieLrt