evaluate Change perplexity to be calculated with base e

Merging with the open docs PR for perplexity, #238.

Closes #241.

Aug 10 '22 17:08 mathemakitten

The documentation is not available anymore as the PR was closed or merged.

Aug 10 '22 17:08 HuggingFaceDocBuilderDev

A comparison, for reference, on the sentence ['Hugging Face is a startup based in New York City and Paris']

Previously, base 2:

import evaluate
perplexity = evaluate.load("perplexity", module_type="metric")
input_texts = ['Hugging Face is a startup based in New York City and Paris']
results = perplexity.compute(model_id='gpt2',
                             add_start_token=False,
                             predictions=input_texts)
print(list(results.keys()))

ppl = 19.1218

Now, base e: ppl = 70.6083

Compare with the canonical example in transformers from here:

encodings = tokenizer(["Hugging Face is a startup based in New York City and Paris"], return_tensors="pt")

max_length = model.config.n_positions
stride = 512

nlls = []
for i in tqdm(range(0, encodings.input_ids.size(1), stride)):
    begin_loc = max(i + stride - max_length, 0)
    end_loc = min(i + stride, encodings.input_ids.size(1))
    trg_len = end_loc - i  # may be different from stride on last loop
    input_ids = encodings.input_ids[:, begin_loc:end_loc].to(device)
    target_ids = input_ids.clone()
    target_ids[:, :-trg_len] = -100

    with torch.no_grad():
        outputs = model(input_ids, labels=target_ids)
        neg_log_likelihood = outputs[0] * trg_len

    nlls.append(neg_log_likelihood)

ppl = torch.exp(torch.stack(nlls).sum() / end_loc)

ppl = 70.6075

And the usual:

model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')

loss = model(input_ids, labels=input_ids)[0]
print(np.exp(loss.cpu().detach().numpy()))

ppl = 70.60746

Aug 10 '22 22:08 mathemakitten