MetaGPT icon indicating copy to clipboard operation
MetaGPT copied to clipboard

RAG faiss AssertionError

Open AprilCat opened this issue 9 months ago • 2 comments

Bug description

execute this demo

import asyncio

from metagpt.rag.engines import SimpleEngine
from metagpt.rag.schema import FAISSRetrieverConfig
from metagpt.const import EXAMPLE_DATA_PATH

DOC_PATH = EXAMPLE_DATA_PATH / "rag/travel.txt"

async def main():
    engine = SimpleEngine.from_docs(input_files=[DOC_PATH], retriever_configs=[FAISSRetrieverConfig()])

    answer = await engine.aquery("What does Bob like?")
    print(answer)

if __name__ == "__main__":
    asyncio.run(main())

get error

Traceback (most recent call last):
  File "/home/wanfu/projects/llm/multi_agent_rag/src/simple_custom_object.py", line 26, in <module>
    asyncio.run(main())
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/wanfu/projects/llm/multi_agent_rag/src/simple_custom_object.py", line 21, in main
    engine.add_docs([DOC_PATH])
  File "/mnt/data/work/development/projects/llm/MetaGPT/metagpt/rag/engines/simple.py", line 195, in add_docs
    self._save_nodes(nodes)
  File "/mnt/data/work/development/projects/llm/MetaGPT/metagpt/rag/engines/simple.py", line 274, in _save_nodes
    self.retriever.add_nodes(nodes)
  File "/mnt/data/work/development/projects/llm/MetaGPT/metagpt/rag/retrievers/faiss_retriever.py", line 12, in add_nodes
    self._index.insert_nodes(nodes, **kwargs)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/site-packages/llama_index/core/indices/vector_store/base.py", line 320, in insert_nodes
    self._insert(nodes, **insert_kwargs)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/site-packages/llama_index/core/indices/vector_store/base.py", line 311, in _insert
    self._add_nodes_to_index(self._index_struct, nodes, **insert_kwargs)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/site-packages/llama_index/core/indices/vector_store/base.py", line 233, in _add_nodes_to_index
    new_ids = self._vector_store.add(nodes_batch, **insert_kwargs)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/site-packages/llama_index/vector_stores/faiss/base.py", line 121, in add
    self._faiss_index.add(text_embedding_np)
  File "/home/wanfu/data/miniconda3/envs/metagpt/lib/python3.9/site-packages/faiss/__init__.py", line 214, in replacement_add
    assert d == self.d
AssertionError

Bug solved method

Environment information

  • LLM type and model name: zhipuai
  • Embeddings : fastchat, BAAI/bge-large-zh
  • System version: Ubuntu 22.04
  • Python version: 3.9.19
  • MetaGPT version or branch:
  • packages version:
  • installation method: pip install from source

Screenshots or logs

AprilCat avatar Apr 30 '24 04:04 AprilCat

I noticed that the assertion failed because of d != self.d meaning that the dimension of the embedded vector didn't match the dimension of your embedding model(in your case it's 1024) If your embedding model isn't from ollama or gemini, the embedding size will default to 1536 which is the dimension of openai embedding. https://github.com/geekan/MetaGPT/blob/main/metagpt/rag/schema.py#L34-L49

usamimeri avatar May 01 '24 15:05 usamimeri

you can check this similar issue #1213

and change engine = SimpleEngine.from_docs(input_files=[DOC_PATH], retriever_configs=[FAISSRetrieverConfig()]) to engine = SimpleEngine.from_docs(input_files=[DOC_PATH], retriever_configs=[FAISSRetrieverConfig(dimensions=1024)])

usamimeri avatar May 01 '24 16:05 usamimeri