Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

Vllm模式启动模型推理能力会下降吗,会出现问问题回答不完整甚至不返回的现象,server_config.py里面有哪些配置影响上述问题吗?另外load_format':'auto'这有哪几个配置,到哪去查看?

Open Andy1018 opened this issue 11 months ago • 6 comments

Vllm模式启动模型推理能力会下降吗,会出现问问题时回答不完整甚至不返回的现象,server_config.py里面有哪些配置影响上述问题吗?另外load_format':'auto'有哪几个配置项,到哪去查看?

Andy1018 avatar Mar 13 '24 08:03 Andy1018

你好,@Andy1018!很高兴再次见到你。希望你一切都好。

在Vllm模式下,模型推理能力可能会下降,这可能导致回答不完整或不返回。这是因为Vllm对一些模型的支持还不成熟。在server_config.py配置文件中,"infer_turbo""device""load_format"这些配置可能会影响模型推理。

"infer_turbo"控制是否使用Vllm加速推理。如果设置为'vllm',可能会因为Vllm的不成熟支持导致问题。

"device"指定运行模型的设备,如果配置不当(如在不支持CUDA的环境中强制使用CUDA),可能也会影响模型推理。

"load_format"vllm_worker配置参数中,控制模型加载的格式,可能影响模型的加载和推理性能。

关于'load_format':'auto'的具体配置项和它能接受的值,这个配置文件本身并没有提供足够的信息。通常,这类配置的详细说明和可接受的值应该在使用的推理框架或库的官方文档中查找。在这个上下文中,因为是关于Vllm的配置,你应该查看Vllm相关的官方文档或源代码来获取'load_format'的具体配置项和解释。

参考资料:

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 13 '24 08:03 dosubot[bot]

@dosu-bot 这是vllm模式chatglm3-6b模型的输出,为什么带着
,如何配置处理 好的,北京是一座拥有悠久历史和丰富文化的城市。以下是一些建议的旅游景点和活动:

1. 故宫博物院:北京的故宫是中国古代皇家宫殿,也是世界上最大的古代木质建筑群。

2. 天安门广场:作为北京的标志性地标,天安门广场是中国的象征之一,也是世界上最大的城市广场之一。

3. 颐和园:颐和园是北京著名的皇家园林,被誉为“皇家园林博物馆”,是中国古代建筑和园林艺术的杰出代表。

4. 长城:长城是中国古代的军事防御工事,也是世界文化遗产之一,是中国的象征之一。

5. 王府井大街:王府井大街是北京的著名商业街,拥有各种古老的店铺和品牌,也是品尝北京美食的好去处。

此外,您还可以尝试北京的特色美食,如北京烤鸭、炸酱面等。希望这些建议能对您有所帮助,祝您在北京旅途愉快!

Andy1018 avatar Mar 13 '24 08:03 Andy1018

The HTML break tags (<br/>) in the output from the vllm mode with the chatglm3-6b model in Langchain-Chatchat are likely due to the formatting of the responses generated by the model itself or the way the system processes and displays these responses.

To handle or remove these tags, you could implement a function to clean the output by removing or replacing HTML tags. This function can be applied to the response text before it is returned or displayed. Here's an example of how you might implement such a function in Python:

import re

def remove_html_tags(text):
    """Remove HTML tags from a given text."""
    clean_text = re.sub(r'<.*?>', '', text)
    return clean_text

# Example usage with the chat_chat function
def chat_chat_modified(
        self,
        query: str,
        conversation_id: str = None,
        history_len: int = -1,
        history: List[Dict] = [],
        stream: bool = True,
        model: str = LLM_MODELS[0],
        temperature: float = TEMPERATURE,
        max_tokens: int = None,
        prompt_name: str = "default",
        **kwargs,
):
    response = self.chat_chat(
        query=query,
        conversation_id=conversation_id,
        history_len=history_len,
        history=history,
        stream=stream,
        model=model,
        temperature=temperature,
        max_tokens=max_tokens,
        prompt_name=prompt_name,
        **kwargs,
    )
    # Assuming the response is a string with HTML tags
    cleaned_response = remove_html_tags(response)
    return cleaned_response

This function uses a regular expression to find and remove anything that looks like an HTML tag from the provided text. You would need to integrate this cleaning step at the appropriate place in your code, likely right before the response is returned to the user or displayed in the UI. Note that the exact integration depends on how the responses are structured and where the HTML tags are introduced, which is not fully detailed in the provided context.

Unfortunately, the provided context does not explicitly show where or how these HTML break tags are introduced or handled in the vllm mode with the chatglm3-6b model. The vllm mode is not explicitly mentioned or implemented in the provided code snippet from the file server/chat/agent_chat.py. The code primarily deals with setting up an agent chat system, including handling chat history, configuring models, and streaming responses, but does not reference a vllm mode directly.

The chatglm3-6b model generates responses in the Langchain-Chatchat system by using the LLMChain class from the Langchain library. This process involves the initialization of the model, prompt creation, chain execution, generating response, and output. However, it's not clear from the provided context how the chatglm3-6b model handles and generates HTML tags in its responses.

If you could provide more information about where these HTML tags are being introduced or how the vllm mode is implemented, I might be able to provide a more specific solution.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 13 '24 08:03 dosubot[bot]

vllm 模型效果运行能力下降了,掉到原来的8成吧,不返回应该不对

zRzRzRzRzRzRzR avatar Mar 13 '24 14:03 zRzRzRzRzRzRzR

@dosu-bot vllm启动高并发时,问答交互,回答时候偶尔带着<|assistant|> “br/”等标签

Andy1018 avatar Mar 18 '24 05:03 Andy1018

To handle or remove tags like "<|assistant|>" and "br/" that occasionally appear in the output during question and answer interactions in the vllm mode with high concurrency, you can extend the functionality of the OutputFixingParser class or create a similar mechanism if it doesn't already exist in your codebase. This parser should be designed to clean up the output by removing or replacing unwanted tags and characters.

Here's an example of how you can implement a custom output fixing parser:

class CustomOutputFixingParser(OutputFixingParser):
    def parse(self, text: str) -> str:
        # Define the tags or sequences you want to remove or handle
        unwanted_tags = ["<|assistant|>", "<br/>", "br/"]
        # Replace or remove the unwanted tags
        for tag in unwanted_tags:
            text = text.replace(tag, "")  # This example simply removes the tags
        return text

class StructuredChatOutputParserWithRetries(AgentOutputParser):
    # Assuming the rest of the class remains unchanged
    output_fixing_parser: Optional[OutputFixingParser] = Field(default_factory=CustomOutputFixingParser)
    # The rest of the class implementation...

In this example, CustomOutputFixingParser is a new class designed to clean the output text by removing specified unwanted tags. It's set as the default for the output_fixing_parser in StructuredChatOutputParserWithRetries, ensuring that all outputs are processed through this cleaning mechanism.

Please note that this is a basic implementation. Depending on your specific needs, you might want to adjust the cleaning logic, for example, by using regular expressions for more complex patterns or by implementing more sophisticated handling logic for different types of unwanted content.

Unfortunately, I couldn't find specific details about the OutputFixingParser class and its usage within the Langchain-Chatchat codebase. If you have more information about this class or if there are specific parts of the codebase you're interested in, please provide more details so I can give a more accurate answer.

You can find the ChatGLM3Agent.py file in the server/agent/custom_agent/ChatGLM3Agent.py path in the repository.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Mar 18 '24 05:03 dosubot[bot]