Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

[BUG] 使用sample知识库的时候报错

Open haitongwan opened this issue 1 year ago • 1 comments

exception Traceback (most recent call last): File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script exec(code, module.dict) File "/home/data/llm/Langchain-Chatchat/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/home/data/llm/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 221, in knowledge_base_page doc_grid = AgGrid( ^^^^^^^ File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/st_aggrid/init.py", line 356, in AgGrid row_data = __parse_row_data(data) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/st_aggrid/init.py", line 54, in __parse_row_data row_data = data_parameter.to_json(orient='records', date_format='iso') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/pandas/core/generic.py", line 2532, in to_json return json.to_json( ^^^^^^^^^^^^^ File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/pandas/io/json/_json.py", line 191, in to_json ).write() ^^^^^^^ File "/usr/local/anaconda3/envs/llm_langchain_chat/lib/python3.11/site-packages/pandas/io/json/_json.py", line 244, in write return dumps( ^^^^^^ UnicodeEncodeError: 'utf-8' codec can't encode characters in position 4-5: surrogates not allowed

haitongwan avatar Feb 22 '24 06:02 haitongwan

🤖

嗨,@haitongwan!我们又见面了,希望这次我们能一起顺利解决问题。

根据您提供的错误堆栈,问题出在AgGrid函数中,它试图将数据转换为JSON格式,但遇到了无法编码为UTF-8的字符。这是一个常见的问题,当数据包含无法编码为UTF-8的字符时,就会出现UnicodeEncodeError

在Langchain-Chatchat的问题跟踪中,我找到了几个类似的已解决问题,它们提供了一些可能的解决方案:

  1. 问题#2889中,建议在将数据传递给AgGrid函数之前,先清理数据以确保所有字符都可以编码为UTF-8。以下是示例代码:
import pandas as pd

def clean_data(data):
    return data.applymap(lambda x: x.encode('utf-8', 'ignore').decode('utf-8'))

cleaned_data = clean_data(your_data)

然后将清理后的数据传递给AgGrid函数:

doc_grid = AgGrid(
    cleaned_data,
    gb.build(),
    columns_auto_size_mode="FIT_CONTENTS",
    theme="alpine",
    custom_css={
        "#gridToolBar": {"display": "none"},
    },
    allow_unsafe_jscode=True,
    enable_enterprise_modules=False
)
  1. 问题#2852中,建议修改__repr__方法以处理编码错误。以下是示例代码:
def __repr__(self):
    try:
        return f"<message(id='{self.id}', conversation_id='{self.conversation_id}', chat_type='{self.chat_type}', query='{self.query}', response='{self.response}',meta_data='{self.meta_data}',feedback_score='{self.feedback_score}',feedback_reason='{self.feedback_reason}', create_time='{self.create_time}')>"
    except UnicodeEncodeError:
        return "<message: UnicodeEncodeError>"
  1. 问题#2841中,建议确保所有文件(尤其是需要初始化的知识库)都是UTF-8格式。

希望这些信息对您有所帮助。如果问题仍然存在,可能需要进一步调查导致问题的特定字符以及它们被使用的上下文。

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 22 '24 06:02 dosubot[bot]

你是不是打开了,正常我们是utf8编码,你是不是改了东西变成了其他编码,或者你的运行环境非utf8

zRzRzRzRzRzRzR avatar Feb 24 '24 02:02 zRzRzRzRzRzRzR