Bigger mini_batch_size, any impact on the trainer performance?
Hello! I'm training yet another flair embeddings model on my 2060. The dataset is around 20GB, split into the files of 1m lines (1 line = 1 sentence) for the train directory. All the sentences are filtered by the sentence length (ie. there are no long sentences in the dataset).
I've successfully started the training with the following params:
is_forward_lm=True,
hidden_size=1024,
sequence_length=250,
mini_batch_size=100,
max_epochs=20,
For an obvious reason, it takes quite a lot of time :)
On nvtop I'm seeing the following numbers:
GPU: 98%
GPU MEM: 39%
And it takes around 200ms for the mini batch: 2022-09-16 02:08:23,685 | split 1/ 69 | 1900/ 4994 batches | ms/batch 200.08 | loss 1.5363 | ppl 4.6474
Does it make any sense to stop the training and resume it from the checkpoint but with the mini_batch_size=200 or even 250? Will that improve the speed and memory utilization? Will that have any negative impact on the model quality?
@alanakbik could you please elaborate before I'll launch the week of training?
@dchaplinsky only seeing this now, sorry. Did you find an answer to your question?
Honestly, I do not remember :) I've added grad accum and found a balance between mini_batch_size and the size of the individual files (2M of sentences for test and valid and 27 train files, 3m sentences each) to use as much GPU RAM as possible without OOM risk.
mini_batch_size was 256 for forward embeddings trained on 2060 6GB and 200 for backward embeddings on 1070 8GB (but I don't remember the reason, probably I caught an OOM and decided to lower it to stay on the safe side).
And then I just trained them for 1.5 months.