Compute perplexity/logits for the prompt
I'd like to use Phi-2 to compute perplexity of the prompts over an entire dataset. Is there an API for this? In the short term, I'm happy to fork https://github.com/vllm-project/vllm/blob/d0215a58e78572d91dadafe9d832a2db89b09a13/vllm/model_executor/models/phi_1_5.py if you provide pointer on how to do that.
Also happy to later contribute back an API that works for all causal models.
I have the same need. Have anyone found a possible way to get the logits of the prompt?
I have the same need too, but unfortunately it appears that vLLM has not yet implemented support for it, as evidenced by the following issue discussion. https://github.com/vllm-project/vllm/issues/185
I think you can use the parameter prompt_logprobs in SamplingParams for this purpose.
#1328
prompt_logprobs can only return the probabilities for the top <=20 tokens right now, so not applicable for this usecase.
Is there any progress on this issue at the moment?
same issue here
you can set logprobs=1, prompt_logprobs=1. Then,
test prompt >20 , maybe ok
prompt_logprobscan only return the probabilities for the top <=20 tokens right now, so not applicable for this usecase.
- I used to have the same problem. But I just find it to be solvable.
- It seems to me that the actually prompted token's probability will always be returned no matter it is top 20 or not. So you can always append the part of string you care as the suffix of the prompt and use prompt_logprobs to detect it.
Try this code and it may help you solve the problem:
prefix_list = ['my name is', 'I love']
candidate_list = [[' Hongliang', ' Raymond', ' John'], [' ice cream', ' pizza', ' coding']]
# Initialize sampling parameters
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=8, prompt_logprobs=20)
# Process each prefix and corresponding candidates
for prefix, candidates in zip(prefix_list, candidate_list):
results = {}
prefix_tokens = llama_tokenizer(prefix)['input_ids']
prefix_token_length = len(prefix_tokens)
# Generate prompts and tokenize
prompts = [prefix + candidate for candidate in candidates]
prompt_tokens = llama_tokenizer(prompts)
suffix_tokens_length = [len(token) - prefix_token_length for token in prompt_tokens['input_ids']]
# Generate outputs
outputs = llama.generate(prompts, sampling_params)
# Process each output
for candidate, output, suffix_len in zip(candidates, outputs, suffix_tokens_length):
logprobs = output.prompt_logprobs[-suffix_len:]
target_tokens = prompt_tokens['input_ids'][candidates.index(candidate)][-suffix_len:]
# Extract probabilities for the target tokens
log_probs = [logprobs[i][target_tokens[i]] for i in range(suffix_len)]
results[candidate] = log_probs
print(results)
breakpoint()
Try this code and it may help you solve the problem:
prefix_list = ['my name is', 'I love'] candidate_list = [[' Hongliang', ' Raymond', ' John'], [' ice cream', ' pizza', ' coding']] # Initialize sampling parameters sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=8, prompt_logprobs=20) # Process each prefix and corresponding candidates for prefix, candidates in zip(prefix_list, candidate_list): results = {} prefix_tokens = llama_tokenizer(prefix)['input_ids'] prefix_token_length = len(prefix_tokens) # Generate prompts and tokenize prompts = [prefix + candidate for candidate in candidates] prompt_tokens = llama_tokenizer(prompts) suffix_tokens_length = [len(token) - prefix_token_length for token in prompt_tokens['input_ids']] # Generate outputs outputs = llama.generate(prompts, sampling_params) # Process each output for candidate, output, suffix_len in zip(candidates, outputs, suffix_tokens_length): logprobs = output.prompt_logprobs[-suffix_len:] target_tokens = prompt_tokens['input_ids'][candidates.index(candidate)][-suffix_len:] # Extract probabilities for the target tokens log_probs = [logprobs[i][target_tokens[i]] for i in range(suffix_len)] results[candidate] = log_probs print(results) breakpoint()
This method seems inefficient, because each time the generate method is called the sampling has to be executed which is reluctant
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!