Recommended GPU size when training BERT-base
What is the minimum GPU spec for training the base model?
Obviously I realise it depends on the hyperparameters, but I have a 4GB GPU that I'm trying to train BERT-base on with the run_classifier example, and I'm hitting on out of memory problems. Even if I reduce down to seq_len = 200 and batch_size = 4 I hit on problems, and not much point going below that as the training will most likely collapse.
Evidently 4GB will not suffice and I'll need to upgrade. What are people using successfully and with what seq_len and batch_size?
Hey, maybe this will help. With fp16 support I survived the OOM message, even with batch_size=32 (GTX1080 8GB). https://github.com/thorjohnsen/bert/tree/gpu_optimizations
Thanks @AndreasFdev, I concluded there was no way I'd be able to do training with a 4GB GPU, so I managed to lay my hands on a second-hand Titan X with 12GB - working fine now.
@BigBadBurrow What batch size & float precision did you end up on Titan X (12GB)?
@AndreasFdev How do you implement the fp16 support? Use Apex?
I have 15g GPU my batch size is 2 and it always collapse
I'm tried using different GPU, I always end up receiving out of memory exception as all of the GPU memory is taken up on following cards I have
- 1050Ti (4gb)
- 2060 Super (8gb)
For the above I had Operating System - Ubuntu 18.04 CUDA - 10 cuDNN - 7.6 Python - 3.6 tensorflow - 2.3
However, I'm able to run my test on 3060 and 1080Ti only thing that changed is (keeping rest from above as same) CUDA - 11.2 cuDNN - 8 tensorflow - 2.6
I tried changing it to ALBERT or DistilBERT with no option of compiling model or reaching the epochs training in realistic timeframe.
@muhammad-noman-d I recommend you to try https://huggingface.co/docs/transformers/index the transformers library from huggingface, their BERT implementation has smart defaults (FP16) and can be combined with https://github.com/microsoft/DeepSpeed Deepspeed significantly reduce training time and hardware requirements