bilm-tf Perplexity per sentence implementation?

Perplexity per sentence implementation?

Open BigBorg opened this issue 5 years ago • 1 comments

I need to get ppl per sentence for millions of lines. Splitting them into files each containing one sentence would be time consuming. Is it possible to achieve this by modifying dataloader? For example, give the model input as (num_sentences, num_tokens, max_characters_per_token) . The problem is how to pad sentences that doesn't have enough tokens. If this would work, will such padding affect state for next batch? If not, any other suggestions?

Jul 19 '19 05:07 BigBorg

add batch_losses to append losses can get one batch sentences ppl

Sep 10 '20 01:09 demeiyan

bilm-tf bilm-tf copied to clipboard

Perplexity per sentence implementation?

bilm-tf
bilm-tf copied to clipboard