setfit icon indicating copy to clipboard operation
setfit copied to clipboard

Are checkpoints directly available with the SetFitTrainer?

Open ajmcgrail opened this issue 3 years ago • 4 comments

Hi, just looking to see if checkpoints are implemented with the SetFitTrainer. Couldn't find it, unlike how the normal models in Hugging Face use output_dir for saving checkpoints when training a model.

ajmcgrail avatar Oct 17 '22 18:10 ajmcgrail

Hey @challos - good question!

We don't currently support checkpointing, mainly because we didn't find a need for it in our experiments (we found 1 epoch was enough for the contrastive learning step).

Is there a use case you have where this feature would be useful?

lewtun avatar Oct 17 '22 19:10 lewtun

My use case is admittedly not everybody's, but I was running the model over the course of several days on a non trivially sized dataset (~90k rows with 18k labels at 20 iterations for the trainer), only for the model to eventually finish after ~57 hours with the word 'Killed' at the end of the logs, and no model saved. The issue itself could've been brought about by it running on my desktop and me occasionally using it, but the issue would've been avoided entirely if there'd been checkpointing. Or at least be easier to debug.

Or maybe it's just too many labels for the model to reasonably use, I have no clue, but I wanted to have checkpointing done before I dedicate my desktop to working for another few days. Either way, I'm getting about ~3-4 iterations a second with a 3070 at a batch size of 4, and with the nonzero chance that something dumb like a Windows Update interrupts the training process overnight, I'd want to have at least some way to mitigate the time lost.

That aside, from my smaller testing I've been really impressed with SetFit in my specific use case, so thanks for responding!

ajmcgrail avatar Oct 17 '22 20:10 ajmcgrail

Thanks for sharing the context @challos !

I think it would be relatively easy to save the checkpoints for contrastive learning step, since we could just pass these args to the fit() method of the SentenceTransformer: https://sbert.net/docs/package_reference/SentenceTransformer.html?highlight=checkpoint#sentence_transformers.SentenceTransformer.fit

The only downside I see is the proliferation of args in SetFitTrainer, but maybe it's OK to just use the defaults and reduce everything to a single output directory (as in transformers.Trainer)

lewtun avatar Oct 18 '22 09:10 lewtun

+1 from me for that, I agree it proliferates the number of args, but it would also mean parity with the standard hugging face trainers. I'm relatively new to ML/DL so I'm not sure I can be of much help beyond this suggestion, though.

ajmcgrail avatar Oct 18 '22 17:10 ajmcgrail