DB-GPT icon indicating copy to clipboard operation
DB-GPT copied to clipboard

[BUG]: 安装本地知识库时报错:IndexError: list index out of range

Open jiangfangfangxm opened this issue 1 year ago • 1 comments

Describe the bug 要加载一个中文txt文件,作为知识库。 执行”上传并加载到知识库“时报错, 看到后台的错误输出如下: 2023-05-21 11:15:15 | INFO | sentence_transformers.SentenceTransformer | Load pretrained SentenceTransformer: /root/autodl-tmp/models/text2vec-large-chinese 2023-05-21 11:15:15 | WARNING | sentence_transformers.SentenceTransformer | No sentence-transformers model found with name /root/autodl-tmp/models/text2vec-large-chinese. Creating a new one with MEAN pooling. 2023-05-21 11:15:19 | INFO | sentence_transformers.SentenceTransformer | Use pytorch device: cuda 2023-05-21 11:15:19 | INFO | sentence_transformers.SentenceTransformer | Load pretrained SentenceTransformer: /root/autodl-tmp/models/text2vec-large-chinese 2023-05-21 11:15:19 | WARNING | sentence_transformers.SentenceTransformer | No sentence-transformers model found with name /root/autodl-tmp/models/text2vec-large-chinese. Creating a new one with MEAN pooling. 2023-05-21 11:15:24 | INFO | sentence_transformers.SentenceTransformer | Use pytorch device: cuda 2023-05-21 11:15:24 | INFO | chromadb.telemetry.posthog | Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-05-21 11:15:24 | INFO | chromadb | Running Chroma using direct local API. 2023-05-21 11:15:24 | WARNING | chromadb | Using embedded DuckDB with persistence: data will be stored in: /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb 2023-05-21 11:15:24 | INFO | clickhouse_connect.driver.ctypes | Successfully imported ClickHouse Connect C data optimizations 2023-05-21 11:15:24 | INFO | clickhouse_connect.driver.ctypes | Successfully import ClickHouse Connect C/Numpy optimizations 2023-05-21 11:15:24 | INFO | clickhouse_connect.json_impl | Using orjson library for writing JSON byte strings 2023-05-21 11:15:24 | INFO | chromadb.db.duckdb | No existing DB found in /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb, skipping load 2023-05-21 11:15:24 | INFO | chromadb.db.duckdb | No existing DB found in /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb, skipping load 2023-05-21 11:15:24 | INFO | chromadb.telemetry.posthog | Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-05-21 11:15:24 | INFO | chromadb | Running Chroma using direct local API. 2023-05-21 11:15:24 | WARNING | chromadb | Using embedded DuckDB with persistence: data will be stored in: /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb 2023-05-21 11:15:24 | INFO | chromadb.db.duckdb | No existing DB found in /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb, skipping load 2023-05-21 11:15:24 | INFO | chromadb.db.duckdb | No existing DB found in /root/autodl-tmp/DB-GPT/pilot/data/DataDictionary.vectordb, skipping load Batches: 0it [00:00, ?it/s] | stderr | Batches: 0it [00:00, ?it/s] | stderr | 2023-05-21 11:15:25 | ERROR | stderr | 2023-05-21 11:15:25 | ERROR | stderr | Traceback (most recent call last): 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/gradio/routes.py", line 394, in run_predict 2023-05-21 11:15:25 | ERROR | stderr | output = await app.get_blocks().process_api( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/gradio/blocks.py", line 1075, in process_api 2023-05-21 11:15:25 | ERROR | stderr | result = await self.call_function( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/gradio/blocks.py", line 884, in call_function 2023-05-21 11:15:25 | ERROR | stderr | prediction = await anyio.to_thread.run_sync( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync 2023-05-21 11:15:25 | ERROR | stderr | return await get_asynclib().run_sync_in_worker_thread( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread 2023-05-21 11:15:25 | ERROR | stderr | return await future 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run 2023-05-21 11:15:25 | ERROR | stderr | result = context.run(func, *args) 2023-05-21 11:15:25 | ERROR | stderr | File "/root/autodl-tmp/DB-GPT/pilot/server/webserver.py", line 622, in knowledge_embedding_store 2023-05-21 11:15:25 | ERROR | stderr | knowledge_embedding_client.knowledge_embedding() 2023-05-21 11:15:25 | ERROR | stderr | File "/root/autodl-tmp/DB-GPT/pilot/source_embedding/knowledge_embedding.py", line 28, in knowledge_embedding 2023-05-21 11:15:25 | ERROR | stderr | self.knowledge_embedding_client.source_embedding() 2023-05-21 11:15:25 | ERROR | stderr | File "/root/autodl-tmp/DB-GPT/pilot/source_embedding/source_embedding.py", line 78, in source_embedding 2023-05-21 11:15:25 | ERROR | stderr | self.index_to_store(text) 2023-05-21 11:15:25 | ERROR | stderr | File "/root/autodl-tmp/DB-GPT/pilot/source_embedding/source_embedding.py", line 59, in index_to_store 2023-05-21 11:15:25 | ERROR | stderr | self.vector_store = Chroma.from_documents(docs, self.embeddings, persist_directory=persist_dir) 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 338, in from_documents 2023-05-21 11:15:25 | ERROR | stderr | return cls.from_texts( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 307, in from_texts 2023-05-21 11:15:25 | ERROR | stderr | chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/langchain/vectorstores/chroma.py", line 116, in add_texts 2023-05-21 11:15:25 | ERROR | stderr | self._collection.add( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 101, in add 2023-05-21 11:15:25 | ERROR | stderr | ids, embeddings, metadatas, documents = self._validate_embedding_set( 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/chromadb/api/models/Collection.py", line 348, in _validate_embedding_set 2023-05-21 11:15:25 | ERROR | stderr | ids = validate_ids(maybe_cast_one_to_many(ids)) 2023-05-21 11:15:25 | ERROR | stderr | File "/root/miniconda3/envs/dbgpt_env/lib/python3.10/site-packages/chromadb/api/types.py", line 77, in maybe_cast_one_to_many 2023-05-21 11:15:25 | ERROR | stderr | if isinstance(target[0], (int, float)): 2023-05-21 11:15:25 | ERROR | stderr | IndexError: list index out of range

  1. Click on '....'
  2. Scroll down to '....'
  3. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context Add any other context about the problem here.

jiangfangfangxm avatar May 21 '23 03:05 jiangfangfangxm

要确保文档是上传完才能加载。

Aries-ckt avatar May 23 '23 08:05 Aries-ckt