ModernBERT
ModernBERT copied to clipboard
Are there minimum hardware requirements regarding pre-training?
good job for this awesome work! I want to ask if there are specific minimum hardware requirements regarding pre-training, something that I didn't find in the paper. thanks!
I want to pre-train ModernBERT from scratch but I am resource-constrained.
Hello, There is not really a specific minimum hardware requirement per say, as this is not contrastive learning and gradient accumulation should be able to match bigger batch sizes (if you can at least handle one full sequence at your target seq len). However, pre-training is pretty intensive in terms of compute/data if you want to get good results, so if you are resource-constrained, I would suggest to continue the pre-training starting from one of our checkpoints if the domain is not too far.
Thanks for the quick response. As a rule of thumb, do you have any (cheap) cloud GPU to recommend?