Continuously reporting format errors when calling Tencent Cloud's vector database to add filtering conditions

Open orcharddd2024 opened this issue 2 months ago • 0 comments

🐛 Describe the bug

mem0/vector_stores/langchain.py，The problem occurs with this search method, which reports that the format passed in the filter is incorrect

.venv/Lib/site-packages/langchain_community/vectorstores/tencentvectordb.py，The relevant source code is as follows

Error Message and Stack Trace (if applicable)

in .venv/Lib/site-packages/langchain_community/vectorstores/tencentvectordb.py there is a function called "similarity_search_by_vector". When called externally, no matter what format of filter is passed in, it will report an error;

` def search(self, query: str, vectors: List[List[float]], limit: int = 5, filters: Optional[Dict] = None): """ Search for similar vectors in LangChain. """ # For each vector, perform a similarity search if filters: results = self.client.similarity_search_by_vector(embedding=vectors, k=limit, filter=filters) else: results = self.client.similarity_search_by_vector(embedding=vectors, k=limit)

    final_results = self._parse_output(results)
    return final_results

` The source code in the open-source framework mem0 is as shown above, but when called, the following error will be reported

Change the source code of mem0 to the following

` def search(self, query: str, vectors: List[List[float]], limit: int = 5, filters: Optional[Dict] = None): """ Search for similar vectors in LangChain / TencentVectorDB. Compatible with TencentVectorDB filter grammar. """ filter_expr = None if filters: if isinstance(filters, dict): # 转换为 LangChain/TencentVectorDB DSL 可解析格式 filter_parts = [] for k, v in filters.items(): if v is None: continue # 自动判断类型，加引号 if isinstance(v, str): v = v.replace('"', '\"') # 转义双引号 filter_parts.append(f'{k} == "{v}"') else: filter_parts.append(f'{k} == {v}') filter_expr = " and ".join(filter_parts)

        elif isinstance(filters, str):
            # 容错转换: 单等号改双等号, 单引号改双引号
            filter_expr = filters.replace(" = ", " == ").replace("'", '"')

    # （可选）日志调试
    # print(f"[VectorSearch] filter_expr={filter_expr}")

    if filter_expr:
        results = self.client.similarity_search_by_vector(
            embedding=vectors, k=limit, filter=filter_expr
        )
    else:
        results = self.client.similarity_search_by_vector(
            embedding=vectors, k=limit
        )

    final_results = self._parse_output(results)
    return final_results

Report the following error，The concatenated string is also incorrect；

The first mistake is In the similarity search vector call of TencentVectorDB, the passed filter parameter is not a string or None, but a dictionary or other type, causing the Lark parser in the underlying translate_filter() function to report an error: TypeError: text must be str or bytes
The second mistake is The Lark syntax parser used internally by TencentVectorDB does not accept traditional SQL style expressions (user_i='zz '). In the source code of langchain_comunity. vectorstores. tencentvectordb (you can open it to see the translate_filter definition), The expected filter expression syntax of Tencent Vector Database's LangChain wrapper is actually JSON style or Python logical expression, rather than SQL format.

but Spelling the expression as user_id=="zz" and... still rejected by Lark......

So no matter how you try to fix it, it always reports an error. Is there a bug in this area? Or was there something I didn't notice? How should I modify it?

Description

Error as shown above

System Info

langchain 0.3.27 langchain-community 0.3.29 langchain-core 0.3.79 langchain-openai 0.2.0 langchain-text-splitters 0.3.11 langsmith 0.4.28 lark 1.3.0 tcvectordb 1.8.4

Oct 14 '25 10:10 orcharddd2024