Qwen [BUG] <title>请问如何进行流失输出

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

model_path = '/root/.cache/modelscope/hub/qwen/Qwen-7B-Chat-Int4'
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", fp16=True, trust_remote_code=True).eval()
model.generation_config = GenerationConfig.from_pretrained(model_path, trust_remote_code=True)
class QianWenChatLLM(LLM):
	max_length = 10000
	temperature: float = 0.01
	top_p = 0.9

	def __init__(self):
		super().__init__()

	@property
	def _llm_type(self):
		return "ChatLLM"

	def _call(self, prompt: str, stop: Optional[List[str]] = None,**kwargs: Any) -> str:
		for response in model.chat_stream(tokenizer, query, history=None):
			return response

请问如何进行流式输出，总是报错：ValidationError: 1 validation error for Generation text str type expected (type=type_error.str)

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Dec 22 '23 08:12 whm233

如果是代码直接调用Qwen模型的话

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-7B-Chat",
    device_map="auto",
    trust_remote_code=True
).eval()

history = [("你好", "你好！很高兴为你提供帮助。")]
query = "给我讲一个年轻人奋斗创业最终取得成功的故事。"

for response in model.chat_stream(tokenizer, query, history=history):
    print(response)

输出结果

这是一个
这是一个叫做
这是一个叫做杰
这是一个叫做杰克
这是一个叫做杰克的年轻人
...
杰克是一个非常有才华的年轻人，他的创新思维和独特的见解使他在科技领域迅速崭露头角。然而，在他追求梦想的过程中，他也遭遇了许多挫折和困难。

尽管如此，杰克从未放弃过。他一直坚持不懈地努力工作，不断学习新的知识和技能，以提高自己的竞争力。他还积极寻求投资和支持，寻找合适的合作伙伴和团队成员，共同推进他的创业计划。

经过数年的努力和拼搏，杰克终于实现了他的梦想。他的公司推出了一系列革命性的产品和服务，受到了市场的热烈欢迎和广泛赞誉。如今，杰克已经成为了业界的一位重要人物，他的故事激励着无数年轻人勇往直前，追寻自己的梦想。
这是一个叫做杰克的年轻人，他拥有自己的梦想和想法，并且有着坚定的毅力和决心去实现它们。

杰克是一个非常有才华的年轻人，他的创新思维和独特的见解使他在科技领域迅速崭露头角。然而，在他追求梦想的过程中，他也遭遇了许多挫折和困难。

尽管如此，杰克从未放弃过。他一直坚持不懈地努力工作，不断学习新的知识和技能，以提高自己的竞争力。他还积极寻求投资和支持，寻找合适的合作伙伴和团队成员，共同推进他的创业计划。

经过数年的努力和拼搏，杰克终于实现了他的梦想。他的公司推出了一系列革命性的产品和服务，受到了市场的热烈欢迎和广泛赞誉。如今，杰克已经成为了业界的一位重要人物，他的故事激励着无数年轻人勇往直前，追寻自己的梦想。

Dec 22 '23 10:12 jklj077

如果是代码直接调用Qwen模型的话

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen-7B-Chat",
    device_map="auto",
    trust_remote_code=True
).eval()

history = [("你好", "你好！很高兴为你提供帮助。")]
query = "给我讲一个年轻人奋斗创业最终取得成功的故事。"

for response in model.chat_stream(tokenizer, query, history=history):
    print(response)

输出结果

谢谢大佬回复。我是想基于langchain的ConversationalRetrievalChain做流式输出：qa_chain = ConversationalRetrievalChain.from_llm( llm=qwllm, retriever=compression_retriever, memory=memory, condense_question_prompt=CUSTOM_QUESTION_PROMPT ) 。请问有哪些地方有例子可以借鉴吗，网上似乎都是调用接口设置stream为true，没有直接调用本地模型的做法。

Dec 25 '23 01:12 whm233

LangChain的话可以用FastChat+vLLM部署个API（参考README的部署部分），网上的方案就适用了。

Dec 25 '23 02:12 jklj077

LangChain的话可以用FastChat+vLLM部署个API（参考README的部署部分），网上的方案就适用了。

ok谢谢大佬指点，我还看到了有个openai_api，那个可以吗，因为FastChat+vLLM部署起来似乎有点问题

Dec 25 '23 03:12 whm233

也可以的，不过那个脚本建议只用于测试，不容易做规模扩展的哈。

Dec 25 '23 04:12 jklj077

qwen支持langchain的流式输出，你这个_call方法写的不对，这个类只是一个包装类。

Jan 24 '24 06:01 fengmy

Qwen Qwen copied to clipboard

[BUG] <title>请问如何进行流失输出

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Qwen
Qwen copied to clipboard