langchain icon indicating copy to clipboard operation
langchain copied to clipboard

AzureSearch.avector_search_with_score() triggers "TypeError: 'AsyncSearchItemPaged' object is not iterable" when calling _results_to_documents()

Open chrislrobert opened this issue 6 months ago • 10 comments

Checked other resources

  • [X] I added a very descriptive title to this issue.
  • [X] I searched the LangChain documentation with the integrated search.
  • [X] I used the GitHub search to find a similar question and didn't find it.
  • [X] I am sure that this is a bug in LangChain rather than my code.
  • [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

embeddings = AzureOpenAIEmbeddings(
	azure_endpoint=azure_endpoint,
	openai_api_version=openai_api_version,
	openai_api_key=openai_api_key,
	openai_api_type=openai_api_type,
	deployment=deployment,
	chunk_size=1)

vectorstore = AzureSearch(
	azure_search_endpoint=azure_search_endpoint,
	azure_search_key=azure_search_key,
	index_name=index_name,
	embedding_function=embeddings.embed_query,
)

system_message_prompt = SystemMessagePromptTemplate.from_template(
	system_prompt)
human_message_prompt = HumanMessagePromptTemplate.from_template(
	human_template)
chat_prompt = ChatPromptTemplate.from_messages(
	[system_message_prompt, human_message_prompt])

doc_chain = load_qa_chain(
	conversation_llm, chain_type="stuff", prompt=chat_prompt, callback_manager=default_manager
)

conversation_chain = ConversationalRetrievalChain(
	retriever=vectorstore.as_retriever(search_type="similarity_score_threshold", k=rag_top_k,
									   search_kwargs={"score_threshold": rag_score_threshold}),
	combine_docs_chain=doc_chain,
	question_generator=question_generator,
	return_source_documents=True,
	callback_manager=default_manager,
	rephrase_question=False,
	memory=memory,
	max_tokens_limit=max_retrieval_tokens,
)

result = await conversation_chain.ainvoke({"question": question, "chat_history": chat_history}

Error Message and Stack Trace (if applicable)

TypeError("'AsyncSearchItemPaged' object is not iterable")Traceback (most recent call last):

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/base.py", line 208, in ainvoke await self._acall(inputs, run_manager=run_manager)

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 212, in _acall docs = await self._aget_docs(new_question, inputs, run_manager=_run_manager)

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 410, in _aget_docs docs = await self.retriever.ainvoke(

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_core/retrievers.py", line 280, in ainvoke raise e

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_core/retrievers.py", line 273, in ainvoke result = await self._aget_relevant_documents(

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 1590, in _aget_relevant_documents await self.vectorstore.asimilarity_search_with_relevance_scores(

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 663, in asimilarity_search_with_relevance_scores result = await self.avector_search_with_score(query, k=k, **kwargs)

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 750, in avector_search_with_score return _results_to_documents(results)

File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 1623, in _results_to_documents docs = [

TypeError: 'AsyncSearchItemPaged' object is not iterable

Description

This commit for issue #24064 caused a regression in async support. After that commit, avector_search_with_score() calls _asimple_search(), which uses async with self.async_client, and then tries to call _results_to_documents() with the results — but that triggers a "TypeError: 'AsyncSearchItemPaged' object is not iterable" because it uses AsyncSearchItemPaged on a closed HTTP connection (because the connection closed at the end of the _asimple_search() with block.

The original async PR #22075 seemed to have the right idea: the async results need to be handled within the with block. Looking at that code, it looks like it should probably work. However, if I roll back to 0.2.7, I run into the "KeyError('content_vector')" that triggered issue #24064. For the moment, I've gotten things running by overriding AzureSearch as follows:

class ExtendedAzureSearch(AzureSearch):
    """Extended AzureSearch class with patch to fix async support."""

    async def _asimple_search_docs(
        self,
        embedding: List[float],
        text_query: str,
        k: int,
        *,
        filters: Optional[str] = None,
        **kwargs: Any,
    ) -> List[Tuple[Document, float]]:
        """Perform vector or hybrid search in the Azure search index.

        Args:
            embedding: A vector embedding to search in the vector space.
            text_query: A full-text search query expression;
                Use "*" or omit this parameter to perform only vector search.
            k: Number of documents to return.
            filters: Filtering expression.
        Returns:
            Matching documents with scores
        """
        from azure.search.documents.models import VectorizedQuery

        async with self.async_client as async_client:
            results = await async_client.search(
                search_text=text_query,
                vector_queries=[
                    VectorizedQuery(
                        vector=np.array(embedding, dtype=np.float32).tolist(),
                        k_nearest_neighbors=k,
                        fields=FIELDS_CONTENT_VECTOR,
                    )
                ],
                filter=filters,
                top=k,
                **kwargs,
            )
            docs = [
                (
                    Document(
                        page_content=result.pop(FIELDS_CONTENT),
                        metadata=json.loads(result[FIELDS_METADATA])
                        if FIELDS_METADATA in result
                        else {
                            key: value for key, value in result.items() if key != FIELDS_CONTENT_VECTOR
                        },
                    ),
                    float(result["@search.score"]),
                )
                async for result in results
            ]
        return docs

    # AP-254 - This version of avector_search_with_score() calls _asimple_search_docs() instead of _asimple_search()
    # followed by _results_to_documents(results) because _asimple_search() uses `async with self.async_client`, which
    # closes the paging connection on return, which makes it so the results are not available for
    # _results_to_documents() (triggering "TypeError: 'AsyncSearchItemPaged' object is not iterable").
    async def avector_search_with_score(
        self,
        query: str,
        k: int = 4,
        filters: Optional[str] = None,
        **kwargs: Any,
    ) -> List[Tuple[Document, float]]:
        """Return docs most similar to query.

        Args:
            query (str): Text to look up documents similar to.
            k (int, optional): Number of Documents to return. Defaults to 4.
            filters (str, optional): Filtering expression. Defaults to None.

        Returns:
            List[Tuple[Document, float]]: List of Documents most similar
                to the query and score for each
        """
        embedding = await self._aembed_query(query)
        return await self._asimple_search_docs(
            embedding, "", k, filters=filters, **kwargs
        )

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 Python Version: 3.10.9 (v3.10.9:1dd9be6584, Dec 6 2022, 14:37:36) [Clang 13.0.0 (clang-1300.0.29.30)]

Package Information

langchain_core: 0.2.9 langchain: 0.2.11 langchain_community: 0.2.10 langsmith: 0.1.81 langchain_aws: 0.1.7 langchain_openai: 0.1.8 langchain_text_splitters: 0.2.2 langchainplus_sdk: 0.0.21 langgraph: 0.1.14

chrislrobert avatar Jul 27 '24 11:07 chrislrobert