llama_index
llama_index copied to clipboard
Unable to query the Weaviate Index using RaptorRetriever and RetrieverQueryEngine Modules[Bug]:
Bug Description
Packages
python = "^3.11" llama-index = "^0.10.28" llama-index-embeddings-huggingface = "^0.2.0" llama-index-llms-huggingface = "^0.1.4" torch = {version = "^2.2.2+cu121", source = "pytorch-gpu"} torchvision = {version = "^0.17.2+cu121", source = "pytorch-gpu"} torchaudio = {version = "^2.2.2+cu121", source = "pytorch-gpu"} llama-index-vector-stores-weaviate = "^0.1.4" ray = {extras = ["data", "serve"], version = "^2.10.0"} llama-index-packs-raptor = "^0.1.3" llama-index-llms-ollama = "^0.1.2" llama-index-embeddings-ollama = "^0.1.2" umap-learn = "^0.5.6"
Error Details
{'data': {'Get': {'RaptorIndex': None}}, 'errors': [{'locations': [{'column': 6, 'line': 1}], 'message': 'invalid \'where\' filter: data type filter cannot use "valueInt" on type "number", use "valueNumber" instead', 'path': ['Get', 'RaptorIndex']}]}
Version
^0.10.28
Steps to Reproduce
Please run the below code with Weaviate running on the back ground.
Code:
LLM_MODEL_NAME = os.getenv('LLM_MODEL_NAME', 'gemma:2b') EMBEDDINGS_MODEL_NAME = os.getenv('EMBEDDINGS_MODEL_NAME', 'nomic-embed-text')
embed_model = OllamaEmbedding( model_name=EMBEDDINGS_MODEL_NAME )
os.environ["OPENAI_API_KEY"] = "NA"
llm_model = Ollama(model=LLM_MODEL_NAME, request_timeout=400.0)
vdb_client = weaviate.Client(url="http://localhost:8080")
vector_store = WeaviateVectorStore(weaviate_client=vdb_client, index_name="RaptorIndex", text_key="text")
setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vdb_index = VectorStoreIndex.from_documents(documents=[], storage_context=storage_context)
documents = SimpleDirectoryReader(input_files=["documents/raptor_paper.pdf"]).load_data()
raptor_pack = RaptorPack( documents, embed_model=embed_model, # used for embedding clusters llm=llm_model, # used for generating summaries vector_store=vector_store, # used for storage similarity_top_k=2, # top k for each layer, or overall top-k for collapsed mode="collapsed", # sets default mode transformations=[ SentenceSplitter(chunk_size=400, chunk_overlap=50) ], # transformations applied for ingestion )
nodes = raptor_pack.run("What baselines is raptor compared against?", mode="collapsed")
retriever = RaptorRetriever( [], embed_model=embed_model, # used for embedding clusters llm=llm_model, # used for generating summaries vector_store=vector_store, # used for storage similarity_top_k=2, # top k for each layer, or overall top-k for collapsed mode="tree_traversal", # sets default mode )
query_engine = RetrieverQueryEngine.from_args( retriever, llm=llm_model )
response = query_engine.query("What baselines was RAPTOR compared against?")
Relevant Logs/Tracbacks
{'data': {'Get': {'RaptorIndex': None}}, 'errors': [{'locations': [{'column': 6, 'line': 1}], 'message': 'invalid \'where\' filter: data type filter cannot use "valueInt" on type "number", use "valueNumber" instead', 'path': ['Get', 'RaptorIndex']}]}
The error you're encountering is due to a mismatch in the expected data type for the 'where' filter in your query to the Weaviate index. The filter is attempting to use "valueInt" on a field that is of type "number", which requires the use of "valueNumber" instead. This issue arises from the way data types are handled when applying filters in the LlamaIndex integration with Weaviate.
To resolve this, ensure that when you're applying filters in your queries, integers and floats are correctly represented using "valueNumber". This adjustment is necessary because Weaviate expects numerical values to be specified with "valueNumber", regardless of whether they are integers or floats. Here's a quick guide on how to adjust your filter application:
- For float values, continue using "valueNumber".
- For integer values, also use "valueNumber" instead of "valueInt".
- If you're dealing with strings that represent numeric values, convert them to floats and use "valueNumber".
This change should align with the data types of the fields in your Weaviate index and the expectations of the LlamaIndex queries. If after making these adjustments you still face issues, consider verifying that your LlamaIndex and Weaviate versions are compatible and up to date. Upgrading to the latest version of LlamaIndex might also introduce fixes and improvements that could help resolve this issue.
Details
Sources
- llama-index-legacy/llama_index/legacy/vector_stores/weaviate.py
- llama-index-legacy/llama_index/legacy/query_engine/retriever_query_engine.py
- llama-index-integrations/vector_stores/llama-index-vector-stores-weaviate/llama_index/vector_stores/weaviate/base.py
- llama-index-legacy/llama_index/legacy/callbacks/llama_debug.py
- pyproject.toml
- CHANGELOG.md
- docs/docs/CHANGELOG.md
- llama-index-core/llama_index/core/query_engine/retriever_query_engine.py
- llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor/base.py
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
Where should I make this change? Please provide detailed steps.
I have implement this using the below link, but Weaviate VDB.