self-supervised-speech-recognition
self-supervised-speech-recognition copied to clipboard
[Question] How do I calculate max_tokens max value?
Given that I'm using for training 5 GPUs GeForce GTX 1080 Ti with 10.917GB memory each, how can I calculate the max_tokens so that no memory error occurs?
batch_duration (s) = max_tokens / 16000 For example, if max_tokens is set to 160000, the total audio duration of a batch is limited to 10 seconds.
So my question is .. how many seconds can I have inside a batch? If I set max_tokens to a big number ( 1 200 000 ) I get an error:
2021-03-03 11:40:50 | WARNING | fairseq.trainer | OOM: Ran out of memory with exception: CUDA out of memory. Tried to allocate 210.00 MiB (G PU 4; 10.92 GiB total capacity; 9.16 GiB already allocated; 147.50 MiB free; 10.18 GiB reserved in total by PyTorch) Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f9c2042fbdd in /space/homes/user/workspace/lib/python3.6/site-p ackages/torch/lib/libc10.so)
Does the number of virtual GPUs count?
how many seconds can I have inside a batch? --> I can't give you an exact number but It should be as high as possible depending on your GPU memory. You should try with different settings to see how it goes
Try to lower the max_tokens if you encounter a memory error, but the number must not lower than the number of tokens of the longest audio