ModernBERT icon indicating copy to clipboard operation
ModernBERT copied to clipboard

Are there minimum hardware requirements regarding pre-training?

Open lefterisloukas opened this issue 10 months ago • 2 comments

good job for this awesome work! I want to ask if there are specific minimum hardware requirements regarding pre-training, something that I didn't find in the paper. thanks!

I want to pre-train ModernBERT from scratch but I am resource-constrained.

lefterisloukas avatar Jan 16 '25 11:01 lefterisloukas

Hello, There is not really a specific minimum hardware requirement per say, as this is not contrastive learning and gradient accumulation should be able to match bigger batch sizes (if you can at least handle one full sequence at your target seq len). However, pre-training is pretty intensive in terms of compute/data if you want to get good results, so if you are resource-constrained, I would suggest to continue the pre-training starting from one of our checkpoints if the domain is not too far.

NohTow avatar Jan 17 '25 09:01 NohTow

Thanks for the quick response. As a rule of thumb, do you have any (cheap) cloud GPU to recommend?

lefterisloukas avatar Jan 17 '25 09:01 lefterisloukas