OpenLLM
OpenLLM copied to clipboard
bug: Unexpected token generation under /generate_stream (stream)
Describe the bug
It seem's like some error under stream mode that broke the generation result
To reproduce
- start openllm and use some chat model (in the case, i use chatglm2)
openllm start chatglm --model-id THUDM/chatglm2-6b-32k
- sent some query using stream or non-stream mode and set top_k=1
top_k=1 should set greedy decode for LM, so that the output will always the same.
stream mode
openllm query --sampling-params top_k=1 "where is taiwan"
# 's national football team located?
non-stram mode
openllm query --no-stream --sampling-params top_k=1 "where is taiwan"
# Taiwan is a small island nation located in East Asia. It is south of the Shandong Peninsula in China and is bordered by the Pacific Ocean to the east and the Taiwan Strait to the south. Taiwan is a popular tourist destination and is known for its vibrant culture, delicious food, and beautiful natural scenery.
- the result is not the same under greedy decode, and the result under stream mode is broken (but not always)
Logs
No response
Environment
openllm==0.3.14 transformers==4.34.1 torch==2.1.0+cu118
System information (Optional)
No response
hmm, can you try with vllm backend if you have GPU?
close for openllm 0.6