Fix CUDA runtime error sampleMultinomialOnce #2286

Open xingyaoww opened this issue 3 years ago • 1 comments

Before submitting

[x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
[x] Did you read the contributor guideline?
[x] Did you make sure to update the docs?
[x] Did you write any new necessary tests?

What does this PR do?

Fixes #2286 by applying the fix suggested by @ryonakamura.

Causes of the issue PyTorch throws a CUDA assert error about sampleMultinomialOnce when we try to sample from a distribution that contains all zero (i.e. sum > accZero, see torch implementation here). When we try to sample the next token after the max length, fairseq will set the log probability to all other tokens except self.eos to be -inf (see fairseq implementation here).

The log probability for self.eos is unlikely to be -inf from the model, therefore the multinomial sampling should always sample EOS in this case. Yet in practice, when we take the exponent of the log-probability, the probability for EOS WILL be 0 in some cases potentially due to floating point precision (making the vector to contains all zero), and this causes the CUDA assert to be thrown.

The fix suggests by @ryonakamura manually set the log probability for EOS to be 1, hence exp(1) = e > 0 which will no longer trigger the CUDA error, and will sample the EOS as expected.

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Jun 16 '22 00:06 xingyaoww

I was able to pull this branch and confirm it prevents the CUDA error without changing model predictions.

Aug 22 '22 21:08 colinclement