stopes icon indicating copy to clipboard operation
stopes copied to clipboard

Index training failing with CUDA out of memory error.

Open oneraghavan opened this issue 3 years ago • 3 comments

I am trying to mine bitext with one language having 63292172 number of lines .

My mining fails with following error.

CUDA out of memory. Tried to allocate 22.62 GiB (GPU 0; 39.59 GiB total capacity; 29.82 GiB already allocated; 4.79 GiB free; 31.19 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any suggestions on how to make this work ?

oneraghavan avatar Jul 20 '22 05:07 oneraghavan

what step is this, do you have the logs?

When mining for languages with a lot of data, we did split into multiple index. This is not yet automated in the pipeline (part of our TODOs), you might need to do this depending on the memory you have available.

Mortimerp9 avatar Jul 28 '22 10:07 Mortimerp9

@oneraghavan did you manage to get it to work?

Mortimerp9 avatar Aug 22 '22 08:08 Mortimerp9

@Mortimerp9 Nope, when is the multiple index pipeline expected ?

oneraghavan avatar Aug 23 '22 01:08 oneraghavan

@oneraghavan, sorry for the delay in answering, we've have released recently a new version with some memory optimisation and the code to automatically split large datasets in multiple indexes, you can see some details in: https://facebookresearch.github.io/stopes/docs/pipelines/global_mining#splitting-and-merging-languages

Mortimerp9 avatar Dec 29 '22 17:12 Mortimerp9