vllm
vllm copied to clipboard
Waiting sequence group should have only one prompt sequence.
I encountered the following error while using vllm to run baichuan, always after running for a while:
"Waiting sequence group should have only one prompt sequence."
Could you please tell me why this happens?
I use V100 32GB GPU and set batch size as 4 like this:
prompt_ids = [[195, ..., 196], [195, ..., 196], [195, ..., 196], [195, ..., 196]]
sampling_params = SamplingParams(n=3, temperature=0.3, top_p=0.85, top_k=5, max_tokens=2048, presence_penalty=1.1)
output_list = llm.generate(None, sampling_params, prompt_id_list, use_tqdm=False)
Thank you !
I observe the same error with CodeLlama-7b as well.
CC: @WoosukKwon @zhuohan123
I found the same problem. It occurs when setting n (the number of sequences returned) greater than 1, and occurs frequently when there is less gpu memory. A simple solution is to copy the prompt and set n to 1, but will lose some speed. There should be a better way.
I'm running into this as well—it seems to be more prevalent with larger models and also shows up when using best_of.
After some digging, the bug seems to be related to calling _preempt_by_recompute from _preempt, which inserts sequence groups at the front of the waiting queue. (But based on the TODO there, vLLM doesn't support recomputation for groups with multiple sequences?)
A quick fix is to force _preempt to only use PreemptionMode.SWAP, which fixes the error for me—but that's probably not ideal.
I encountered this with Mistral 7b on an A10 using AsyncLLMEngine when pending requests increased above 0. Removing n and best_of from the SamplingParams is a workaround.
Faced this issue with codellama 13B (bfloat16) on an A100 80GB GPU with 64GB of CPU swap when using n=1 and best_of=16 for generated length 512. Ultimately also had to set best_of to 1.
I experience that too also when running llama3 8b with SamplingParams(n=1, ...) and call model in parallel.. in general, i think it relates to this issue so it's something with the default_scheduler ..
would love to get some help here