OpenLLM bug: Unexpected token generation under /generate

bug: Unexpected token generation under /generate_stream (stream)

Open p208p2002 opened this issue 1 year ago • 1 comments

Describe the bug

It seem's like some error under stream mode that broke the generation result

To reproduce

start openllm and use some chat model (in the case, i use chatglm2)

openllm start chatglm --model-id THUDM/chatglm2-6b-32k

sent some query using stream or non-stream mode and set top_k=1

top_k=1 should set greedy decode for LM, so that the output will always the same.

stream mode

openllm query --sampling-params top_k=1  "where is taiwan"
# 's national football team located?

non-stram mode

openllm query --no-stream --sampling-params top_k=1  "where is taiwan"
# Taiwan is a small island nation located in East Asia. It is south of the Shandong Peninsula in China and is bordered by the Pacific Ocean to the east and the Taiwan Strait to the south. Taiwan is a popular tourist destination and is known for its vibrant culture, delicious food, and beautiful natural scenery.

the result is not the same under greedy decode, and the result under stream mode is broken (but not always)

Logs

No response

Environment

openllm==0.3.14 transformers==4.34.1 torch==2.1.0+cu118

System information (Optional)

No response

Nov 06 '23 06:11 p208p2002

hmm, can you try with vllm backend if you have GPU?

Nov 13 '23 18:11 aarnphm

close for openllm 0.6

Jul 12 '24 01:07 bojiang

OpenLLM OpenLLM copied to clipboard

bug: Unexpected token generation under /generate_stream (stream)

Describe the bug

To reproduce

stream mode

non-stram mode

Logs

Environment

System information (Optional)

OpenLLM
OpenLLM copied to clipboard