bert-toxic-comments-multilabel icon indicating copy to clipboard operation
bert-toxic-comments-multilabel copied to clipboard

CUDA out of memory. What can I do to improve model performance?

Open mrxiaohe opened this issue 5 years ago • 3 comments

I have a Tesla GPU which has only 16 Gb -- much less than what you used for your experiment described in the Medium article. As a result, I had to reduce the max sequence length from 512 to 128, and the batch size from 32 to 16. After 4 epochs, the validation accuracies of the various toxic comment categories were around 0.6 to 0.65. I wonder if increasing the number of epochs would help increase the performance.

In addition, is there a way to continue training a model -- say after 4 epochs, if the validation results are not good, can I continue the training rather than restart the training with a larger number of epochs? Is it sufficient to just rerun fit()`?

Thanks !

mrxiaohe avatar Mar 17 '19 00:03 mrxiaohe

Are you using the BERT-large or BERT-base model type? With BERT-base, you should get very good results with a seq len of 256 and batch size of 16 (I did, anyway...).

Google's recommended seq/batch combos are at https://github.com/google-research/bert#out-of-memory-issues .

ghost avatar Mar 17 '19 02:03 ghost

I am using BERT-large uncased. Did you get your results after only 4 epochs?

mrxiaohe avatar Mar 17 '19 02:03 mrxiaohe

@tombriles I changed the model from large to base (uncased), and now a max seq len of 256 doesn't cause the out of memory error (it did before when I used the large model). I will report back on the performance once training is done!

mrxiaohe avatar Mar 17 '19 02:03 mrxiaohe