AllentDan comments

Results 216 comments of


                                            AllentDan

Support guided decoding for pytorch backend

好像是模型 tokenizer 处理不了，换 internlm2-chat-1_8b 或者其他 llama 模型就正常了。或者输入`work history`内容不要。

[Bug] internvl2-2B awq w4a16量化后掉点严重，应该如何排查？

试试 `lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR`。掉点多少？排查的话，可以用 pytorch engine 跑这个 awq 模型，然后逐 layer 比对原始数据，定位问题。

[Feature] 对api_server的一些建议

> @AllentDan may check `stream_options: {"include_usage": true}` We have to return the usage in each streaming response. `include_usage` is different. > If set, an additional chunk will be streamed before...

[Feature] Support vision module w8a8 inference

Accuracy on MMStar | InternVL2-2B | InternVL2-2B-AWQ | InternVL2-2B-AWQ-VisionW8A8 | |:------------:|:----------------:|:---------------------------:| | 0.498 | 0.495 | 0.477 |

[Bug] 启动服务后, 测试internlm2.5的function call功能返回为None，没有进行调用工具

This is what I got ``` ChatCompletion(id='1', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='I need to use the get_current_weather function to get the current weather in Boston.', role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"location": "Boston"}', name='get_current_weather'),...

[Bug] IntrenVL2-1B awq量化后推理异常问题

Please wait for the next release of LMDeploy. Or, you may build lmdeploy from source. The model was supported in https://github.com/InternLM/lmdeploy/pull/2207 lately.

[Bug] IntrenVL2-1B awq量化后推理异常问题

Downgrade transformers version please.

[Bug] serve的时候event loop报错

gradio 4.0 后引入了好几个问题了。https://github.com/InternLM/lmdeploy/pull/2103 reset 改成用新session就没问题了。 @iWasOmen 试试可以先用起来

[Bug] serve的时候event loop报错

我试了下是OK的，你是怎么操作的。 @yaaisinile

[Bug] serve的时候event loop报错

好像 `pip uninstall uvloop`，代码就都能跑了