vllm icon indicating copy to clipboard operation
vllm copied to clipboard

How to fix incomplete answers?

Open LuciAkirami opened this issue 1 year ago • 6 comments

Tried the vllm both with gpt2-xl and llama2-7b-awq and in both cases I get incomplete answers Here's the code:

prompt = [
    "What is Quantum Computing?",
    "How are electrons and protons different?",
]

llm = LLM(model="TheBloke/Llama-2-7b-Chat-AWQ", quantization="AWQ")
answers = llm.generate(prompt)

for i in range(2):
 print("\nPrompt:",prompt[i],"\nGeneration:",answers[i].outputs[0].text)
 print()

Here's the output

Prompt: What is Quantum Computing? 
Generation: 

Quantum computing is a rapidly developing field that uses the principles of quantum


Prompt: How are electrons and protons different? 
Generation: 

Electrons and protons are two types of subatomic particles that

LuciAkirami avatar Dec 08 '23 14:12 LuciAkirami

Set max_tokens higher, it is set to 16 by default I suppose (since that was the length of both responses).

Tostino avatar Dec 08 '23 15:12 Tostino

I tried it with 50 and still the response gets broken in the middle

LuciAkirami avatar Dec 10 '23 06:12 LuciAkirami

I also encountered this problem. Does anyone has any idea?

chqiwang avatar Jan 24 '24 08:01 chqiwang

I have corrected this by replacing my stop token to None.

Tushar-ml avatar Mar 21 '24 07:03 Tushar-ml

Hi @Tushar-ml , may I know which part in the code should we adjust the stop token? I was also facing same issue found that sentence generated is not complete.

I am using pip install vllm to install VLLM as provided.

Following is my test code:

from vllm import LLM, SamplingParams

prompts = [
    "Briefly list down the steps to perform Cook Bacon.",
    "What is the definition of gravity?",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

llm = LLM(model="lmsys/vicuna-13b-v1.3", seed=1024)

outputs = llm.generate(prompts, sampling_params)
# print(outputs)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Generated text: {generated_text!r}")

Print ouput:

Generated text: '\n\n1. Preheat the oven to 375°'
Generated text: '\nWhat is the law of gravity?\nWhat is the formula for calculating gravity'
Generated text: ' Paris.\n\n3.1. The currency used in France is the Euro'
Generated text: ' bright. It will transform the way we live, work, and interact with each'

ee2110 avatar Apr 08 '24 04:04 ee2110

I also encountered this problem. Does anyone has any idea?

ChengShuting avatar May 14 '24 07:05 ChengShuting

@Tostino provided the correct answer.

hmellor avatar May 31 '24 20:05 hmellor