vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Compute perplexity/logits for the prompt

Open dsmilkov opened this issue 1 year ago • 6 comments

I'd like to use Phi-2 to compute perplexity of the prompts over an entire dataset. Is there an API for this? In the short term, I'm happy to fork https://github.com/vllm-project/vllm/blob/d0215a58e78572d91dadafe9d832a2db89b09a13/vllm/model_executor/models/phi_1_5.py if you provide pointer on how to do that.

Also happy to later contribute back an API that works for all causal models.

dsmilkov avatar Jan 07 '24 14:01 dsmilkov

I have the same need. Have anyone found a possible way to get the logits of the prompt?

yiranma0 avatar Jan 23 '24 03:01 yiranma0

I have the same need too, but unfortunately it appears that vLLM has not yet implemented support for it, as evidenced by the following issue discussion. https://github.com/vllm-project/vllm/issues/185

caiyuhu avatar Jan 27 '24 16:01 caiyuhu

I think you can use the parameter prompt_logprobs in SamplingParams for this purpose. #1328

lekhang4497 avatar Mar 06 '24 18:03 lekhang4497

prompt_logprobs can only return the probabilities for the top <=20 tokens right now, so not applicable for this usecase.

dylanbowman314 avatar Jul 01 '24 18:07 dylanbowman314

Is there any progress on this issue at the moment?

junzhang-zj avatar Aug 09 '24 07:08 junzhang-zj

same issue here

Tendo33 avatar Sep 11 '24 02:09 Tendo33

you can set logprobs=1, prompt_logprobs=1. Then, 屏幕截图 2024-09-18 180643

CodeAsPoetry avatar Sep 18 '24 10:09 CodeAsPoetry

test prompt >20 , maybe ok 屏幕截图 2024-09-18 181700

CodeAsPoetry avatar Sep 18 '24 10:09 CodeAsPoetry

prompt_logprobs can only return the probabilities for the top <=20 tokens right now, so not applicable for this usecase.

  • I used to have the same problem. But I just find it to be solvable.
  • It seems to me that the actually prompted token's probability will always be returned no matter it is top 20 or not. So you can always append the part of string you care as the suffix of the prompt and use prompt_logprobs to detect it.

Rachum-thu avatar Dec 02 '24 22:12 Rachum-thu

Try this code and it may help you solve the problem:

prefix_list = ['my name is', 'I love']
candidate_list = [[' Hongliang', ' Raymond', ' John'], [' ice cream', ' pizza', ' coding']]

# Initialize sampling parameters
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=8, prompt_logprobs=20)

# Process each prefix and corresponding candidates
for prefix, candidates in zip(prefix_list, candidate_list):
    results = {}
    prefix_tokens = llama_tokenizer(prefix)['input_ids']
    prefix_token_length = len(prefix_tokens)

    # Generate prompts and tokenize
    prompts = [prefix + candidate for candidate in candidates]
    prompt_tokens = llama_tokenizer(prompts)
    suffix_tokens_length = [len(token) - prefix_token_length for token in prompt_tokens['input_ids']]

    # Generate outputs
    outputs = llama.generate(prompts, sampling_params)

    # Process each output
    for candidate, output, suffix_len in zip(candidates, outputs, suffix_tokens_length):
        logprobs = output.prompt_logprobs[-suffix_len:]
        target_tokens = prompt_tokens['input_ids'][candidates.index(candidate)][-suffix_len:]

        # Extract probabilities for the target tokens
        log_probs = [logprobs[i][target_tokens[i]] for i in range(suffix_len)]
        results[candidate] = log_probs
    print(results)
    breakpoint()

Rachum-thu avatar Dec 02 '24 22:12 Rachum-thu

Try this code and it may help you solve the problem:

prefix_list = ['my name is', 'I love']
candidate_list = [[' Hongliang', ' Raymond', ' John'], [' ice cream', ' pizza', ' coding']]

# Initialize sampling parameters
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=8, prompt_logprobs=20)

# Process each prefix and corresponding candidates
for prefix, candidates in zip(prefix_list, candidate_list):
    results = {}
    prefix_tokens = llama_tokenizer(prefix)['input_ids']
    prefix_token_length = len(prefix_tokens)

    # Generate prompts and tokenize
    prompts = [prefix + candidate for candidate in candidates]
    prompt_tokens = llama_tokenizer(prompts)
    suffix_tokens_length = [len(token) - prefix_token_length for token in prompt_tokens['input_ids']]

    # Generate outputs
    outputs = llama.generate(prompts, sampling_params)

    # Process each output
    for candidate, output, suffix_len in zip(candidates, outputs, suffix_tokens_length):
        logprobs = output.prompt_logprobs[-suffix_len:]
        target_tokens = prompt_tokens['input_ids'][candidates.index(candidate)][-suffix_len:]

        # Extract probabilities for the target tokens
        log_probs = [logprobs[i][target_tokens[i]] for i in range(suffix_len)]
        results[candidate] = log_probs
    print(results)
    breakpoint()

This method seems inefficient, because each time the generate method is called the sampling has to be executed which is reluctant

ControllableGeneration avatar Jan 14 '25 08:01 ControllableGeneration

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

github-actions[bot] avatar Apr 15 '25 02:04 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

github-actions[bot] avatar May 15 '25 02:05 github-actions[bot]