cellpose icon indicating copy to clipboard operation
cellpose copied to clipboard

[FEATURE] Checkpointing

Open le1nax opened this issue 2 months ago • 3 comments

Hello,

I was confused to see that the default training pipeline does not save state_dicts at lowest validation loss? Also I am curious why there is no validation splitting? There is "test data" (=validation data?) which you can input but I dont see where it is used to checkpoint?

Kind Regards Daniel

le1nax avatar Oct 20 '25 13:10 le1nax

Currently in the train.train_seg() function you can save a model every save_every epochs or only at the end. Is there a reference that says that the lowest validation loss is saved?

mrariden avatar Oct 20 '25 14:10 mrariden

I was not saying the paper claimed to have checkpointing based on validation loss, while the code doesnt. I was just expecting it to have this feature, as it is usually common in Deep learning to checkpoint models based on validation. Perhaps there is a reason we dont do it in cellpose, maybe it is assumed the network can only improve by training, which i doubt, after all we are optimizing with adamw

le1nax avatar Oct 20 '25 14:10 le1nax

While checkpointing based on the validation loss isn't built into cellpose, you can save the model at every validation step. We leave it up to the user to choose to use these checkpointed models or the final one

mrariden avatar Oct 20 '25 15:10 mrariden