faster-rnnlm icon indicating copy to clipboard operation
faster-rnnlm copied to clipboard

Entropy is nan when a vocabulary larger than 2 million words is used.

Open mpatsis opened this issue 8 years ago • 0 comments

Hi, When I am using a vocabulary that is larger than 2 million words (e.g., 2.2 million) the validation entropy is always nan. However, on the exact same data if I use a slightly smaller vocabulary (1937725 words) then entropy is calculated normally. The vocabulary is being limited by rare words from the vocabulary file.

Best regards, Rafael

mpatsis avatar Jun 01 '16 11:06 mpatsis