NeMo the NMT infer OOM

the NMT infer OOM

Open XiaoqingNLP opened this issue 3 years ago • 1 comments

Is your feature request related to a problem? Please describe.

Inference does not support automatic batch processing according to length, which leads to the OOM of excessively long sentences and makes it difficult to enlarge the beam.
And I found that as the inference goes on, the occupation of CUDA memory is gradually increasing.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

support automatic batch processing according to length
I have try to solve it by torch.cuda.empty_cache()

Jul 11 '22 02:07 XiaoqingNLP

Do you run out of memory even with batch size 1? If not, the easiest fix is to just reduce the batch size. Another thing you can do is to order your test set so that you have the longest sequences first so that if you do run out of memory, it happens right at the start rather than mid-way through translating your test set.

We don't have plans yet for specifying inference batch sizes based on tokens, but if you're able to implement this, we would welcome a pull request!

Jul 16 '22 03:07 MaximumEntropy

NeMo NeMo copied to clipboard

the NMT infer OOM

NeMo
NeMo copied to clipboard