AllentDan

Results 216 comments of AllentDan

好像是模型 tokenizer 处理不了,换 internlm2-chat-1_8b 或者其他 llama 模型就正常了。或者输入`work history`内容不要。

试试 `lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR`。掉点多少?排查的话,可以用 pytorch engine 跑这个 awq 模型,然后逐 layer 比对原始数据,定位问题。

> @AllentDan may check `stream_options: {"include_usage": true}` We have to return the usage in each streaming response. `include_usage` is different. > If set, an additional chunk will be streamed before...

Accuracy on MMStar | InternVL2-2B | InternVL2-2B-AWQ | InternVL2-2B-AWQ-VisionW8A8 | |:------------:|:----------------:|:---------------------------:| | 0.498 | 0.495 | 0.477 |

This is what I got ``` ChatCompletion(id='1', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='I need to use the get_current_weather function to get the current weather in Boston.', role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"location": "Boston"}', name='get_current_weather'),...

Please wait for the next release of LMDeploy. Or, you may build lmdeploy from source. The model was supported in https://github.com/InternLM/lmdeploy/pull/2207 lately.

Downgrade transformers version please.

gradio 4.0 后引入了好几个问题了。https://github.com/InternLM/lmdeploy/pull/2103 reset 改成用新session就没问题了。 @iWasOmen 试试可以先用起来

我试了下是OK的,你是怎么操作的。 @yaaisinile

好像 `pip uninstall uvloop`,代码就都能跑了