Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

往向量库上传文件报错:TypeError: string indices must be integers, not 'str'

Open b2383355038 opened this issue 9 months ago • 11 comments

问题描述 / Problem Description 在知识库上传csv文件时发生报错

复现问题的步骤 / Steps to Reproduce

  1. 执行 'python startup -a'
  2. 点击 '知识库管理'
  3. 滚动到 '上传文件.'
  4. 问题出现 /报错 预期的结果 / Expected Result 成功的传入到faiss向量库中

实际结果 / Actual Result 报错信息:: 2024-05-07 09:45:16,936 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16,936 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16,937 - utils.py[line:95] - ERROR: ConnectError: error when post /knowledge_base/search_docs: [Errno 111] Connection refused 2024-05-07 09:45:16.937 Uncaught app exception Traceback (most recent call last): File "/root/miniconda3/envs/lc/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script exec(code, module.dict) File "/mnt/Langchain-Chatchat/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/mnt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 330, in knowledge_base_page data = [ ^ File "/mnt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 331, in {"seq": i + 1, "id": x["id"], "page_content": x["page_content"], "source": x["metadata"].get("source"), ~^^^^^^ TypeError: string indices must be integers, not 'str'

环境信息 / Environment Information

  • langchain-ChatGLM 版本/commit 号:v2.10
  • 是否使用 Docker 部署(是/否):否
  • 使用的模型(ChatGLM2-6B / Qwen-7B 等):文心一言4.0
  • 使用的 Embedding 模型(moka-ai/m3e-base 等):bge-large-zh
  • 使用的向量库类型 (faiss / milvus / pg_vector 等): faiss
  • 操作系统及版本 / Operating system and version: 阿里云服务器
  • Python 版本 / Python version:3.10
  • 其他相关环境信息 / Other relevant environment information:

附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.

b2383355038 avatar May 07 '24 02:05 b2383355038

The error you're encountering, TypeError: string indices must be integers, not 'str', suggests that the application is attempting to access elements of a string using string keys, which is only valid for dictionaries or lists. This typically happens when the data structure expected is a list of dictionaries (where each dictionary represents a row from the CSV file, and keys are column headers), but instead, a string or incorrect format is being processed.

To resolve this, ensure your CSV file is correctly parsed into a list of dictionaries. You can use the pandas library to easily convert a CSV file into this format:

import pandas as pd

# Assuming 'csv_file' is the path to your CSV file
df = pd.read_csv(csv_file)
records = df.to_dict(orient='records')

# 'records' is now a list of dictionaries, each representing a row from the CSV.

After parsing the CSV file into the correct format, you can iterate over records and access each column value by its header name, which should align with the expected data structure in your application and prevent the TypeError.

If the issue persists after ensuring the CSV is correctly parsed and the data structure is as expected, it might be necessary to review the specific steps or code handling the upload process for further discrepancies.

Details

Sources


About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

dosubot[bot] avatar May 07 '24 02:05 dosubot[bot]

使用sample数据库能成功吗

zRzRzRzRzRzRzR avatar May 07 '24 02:05 zRzRzRzRzRzRzR

使用sample数据库能成功吗

我没有试,这个好像是接口直接就断开啦

b2383355038 avatar May 07 '24 02:05 b2383355038

你试试本地 命令行跑一下

zRzRzRzRzRzRzR avatar May 07 '24 02:05 zRzRzRzRzRzRzR

你试试本地 命令行跑一下

我试过了不行一直报这个错

b2383355038 avatar May 07 '24 02:05 b2383355038

你的内容是什么格式的

zRzRzRzRzRzRzR avatar May 07 '24 02:05 zRzRzRzRzRzRzR

csv格式的

b2383355038 avatar May 07 '24 02:05 b2383355038

csv是qa两列吗,一般是两列

zRzRzRzRzRzRzR avatar May 07 '24 02:05 zRzRzRzRzRzRzR

不是 就一列5万行,都是一些公司名称

b2383355038 avatar May 07 '24 02:05 b2383355038

那应该户出现这个问题,一列没发做embed,csv是qa对

zRzRzRzRzRzRzR avatar May 07 '24 02:05 zRzRzRzRzRzRzR

但是我之前可以穿进去现在也是穿进去了几十个但现在一直传不进去

b2383355038 avatar May 07 '24 02:05 b2383355038

搞了一个开源,连基本的运行都做不到,唉,这些错误都是直接就能发现的呀,真是看不懂,我也是出了这个问题。 一个128k的模型无限自问自答只能弃用,一个langchain加载个人库报这个错,感觉也完全用不起来

2024-05-27 16:32:22,181 - utils.py[line:95] - ERROR: ReadTimeout: error when post /knowledge_base/create_knowledge_base: timed out 2024-05-27 16:36:43,277 - utils.py[line:95] - ERROR: ReadTimeout: error when post /knowledge_base/search_docs: timed out 2024-05-27 16:36:43.277 Uncaught app exception Traceback (most recent call last): File "/root/langchain_pip/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script exec(code, module.dict) File "/sdb/chatgpt/Langchain-Chatchat/webui.py", line 64, in pages[selected_page]["func"](api=api, is_lite=is_lite) File "/sdb/chatgpt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 330, in knowledge_base_page data = [ ^ File "/sdb/chatgpt/Langchain-Chatchat/webui_pages/knowledge_base/knowledge_base.py", line 331, in {"seq": i + 1, "id": x["id"], "page_content": x["page_content"], "source": x["metadata"].get("source"), ~^^^^^^ TypeError: string indices must be integers, not 'str'

aben1900 avatar May 27 '24 09:05 aben1900

解决了,需要把知识阈值修改为1.0默认就是这个不能改!!!!!!!

liwei413519 avatar Jun 13 '24 03:06 liwei413519

SCORE_THRESHOLD = 1.0 不能修改否则就会报错!!!!!

liwei413519 avatar Jun 13 '24 03:06 liwei413519

在哪个文件修改?

aben1900 avatar Jun 14 '24 05:06 aben1900

在哪个文件修改?

在configs下的kb_config.py

rookie0w0 avatar Jul 02 '24 02:07 rookie0w0

2024-07-18 拉取main分支同样有这个问题,拉取dev分支一切正常了.

HappyJimmyBoy avatar Jul 18 '24 04:07 HappyJimmyBoy