bilm-tf
bilm-tf copied to clipboard
[Question] One Sentence Perplexity Computation Recommendation/Approach
So this might be already answered, and I mean this might be a novice question.
But I simply wanted to compute the perplexity of a single sentence, what parameters would I change to compute the score.
My current approach: I still use .bin/run_test.py However, I redirect the --test_prefix to point to a new folder with only 1 file, with one sentence in it. I have to add the batch_size flag to roughly 1 (i.e. --batch_size 1), which computes a batch_perplexities and respectively they're avg_perplexity.
I can't help but can an itch and wonder, about the unroll_step and if I'm truely computing the perplexity of the sentence in the file.
Let me know if what I've done is accurate, or if I need to change something to properly reflect 1 sentence perplexity computation.
I've attached a run with my current model and output of my current options configurations :)

This would work except for the pesky issue of the statefulness of the LSTM states. The perplexities for the first batch or two are artificially high until the model has processed a few sentences (see https://github.com/allenai/bilm-tf#why-do-i-get-slightly-different-embeddings-if-i-run-the-same-text-through-the-pre-trained-model-twice). The code in run_test makes an implicit assumption that the size of the test set is very large so that the first batch perplexity doesn't have much impact on the overall average.
In the single sentence setting, or when you really want an accurate perplexity for the first batch, it's necessary to run inference with the first sentence 1-2 times first, then once more to calculate the loss. Subsequent batches can just be processed once as usual.
This would work except for the pesky issue of the statefulness of the LSTM states. The perplexities for the first batch or two are artificially high until the model has processed a few sentences (see https://github.com/allenai/bilm-tf#why-do-i-get-slightly-different-embeddings-if-i-run-the-same-text-through-the-pre-trained-model-twice). The code in run_test makes an implicit assumption that the size of the test set is very large so that the first batch perplexity doesn't have much impact on the overall average.
In the single sentence setting, or when you really want an accurate perplexity for the first batch, it's necessary to run inference with the first sentence 1-2 times first, then once more to calculate the loss. Subsequent batches can just be processed once as usual.
Thanks for the solution. Would you implement a simple API for calculating the one sentence ppl?
I don't have any plans to implement it, PRs welcome.
Have the same question here. Just separate the dataset file into many single sentence file.
any easy way the library can support to compute perplexity directly from ELMo hdf5 weight file ?
I lost checkpoint files and only have the fine-tuned hdf5 weights dumped before. Thanks
How can i get perplexity for many sentences? Splitting sentences to files each containning one would be impossible when you have millions sentences. Should I train another language model on top of ELMo embedding?
add batch_losses to append losses can get one batch sentences ppl