flair icon indicating copy to clipboard operation
flair copied to clipboard

Bigger mini_batch_size, any impact on the trainer performance?

Open dchaplinsky opened this issue 3 years ago • 1 comments

Hello! I'm training yet another flair embeddings model on my 2060. The dataset is around 20GB, split into the files of 1m lines (1 line = 1 sentence) for the train directory. All the sentences are filtered by the sentence length (ie. there are no long sentences in the dataset).

I've successfully started the training with the following params:

    is_forward_lm=True,
    hidden_size=1024,
    sequence_length=250,
    mini_batch_size=100,
    max_epochs=20,

For an obvious reason, it takes quite a lot of time :) On nvtop I'm seeing the following numbers: GPU: 98% GPU MEM: 39%

And it takes around 200ms for the mini batch: 2022-09-16 02:08:23,685 | split 1/ 69 | 1900/ 4994 batches | ms/batch 200.08 | loss 1.5363 | ppl 4.6474

Does it make any sense to stop the training and resume it from the checkpoint but with the mini_batch_size=200 or even 250? Will that improve the speed and memory utilization? Will that have any negative impact on the model quality?

dchaplinsky avatar Sep 16 '22 10:09 dchaplinsky

@alanakbik could you please elaborate before I'll launch the week of training?

dchaplinsky avatar Sep 17 '22 05:09 dchaplinsky

@dchaplinsky only seeing this now, sorry. Did you find an answer to your question?

alanakbik avatar Nov 01 '22 09:11 alanakbik

Honestly, I do not remember :) I've added grad accum and found a balance between mini_batch_size and the size of the individual files (2M of sentences for test and valid and 27 train files, 3m sentences each) to use as much GPU RAM as possible without OOM risk.

mini_batch_size was 256 for forward embeddings trained on 2060 6GB and 200 for backward embeddings on 1070 8GB (but I don't remember the reason, probably I caught an OOM and decided to lower it to stay on the safe side).

And then I just trained them for 1.5 months.

dchaplinsky avatar Nov 01 '22 09:11 dchaplinsky